Page 1
DEC 4000 AXP Service Guide Order Number: EK–KN430–SV. B01 Digital Equipment Corporation Maynard, Massachusetts...
Page 2
The Reader’s Comments form at the end of this document requests your critical evaluation to assist in preparing future documentation. The following are trademarks of Digital Equipment Corporation: Alpha AXP, AXP, DEC, DECchip, DECconnect, DECdirect, DECnet, DECserver, DEC VET, DESTA, MSCP, RRD40, ThinWire, TMSCP, TU, UETP, ULTRIX, VAX, VAX DOCUMENT, VAXcluster, VMS, the AXP logo, and the DIGITAL logo.
Page 7
A Environment Variables B Power System Controller Fault Displays C Worksheet for Recording Customer Environment Variable Settings Glossary Index Examples 3–1 Running DRVTST ......3–24 3–2 Running DRVEXR .
Page 8
2–13 Flowchart for Troubleshooting Removable-Media Problems ........2–23 2–14 Flowchart for Troubleshooting Removable-Media...
Page 9
Summary of Diagnostic and Related Commands ..3–1 3–21 4–1 DEC 4000 AXP Fault Detection and Correction ..4–2 4–2 Error Field Bit Definitions for Error Log Interpretation .......
Preface This guide describes the procedures and tests used to service DEC 4000 AXP systems. Intended Audience This guide is intended for use by Digital Equipment Corporation service personnel and qualified self-maintenance customers. Conventions The following coventions are used in this guide.
Page 12
In some illustrations, small drawings of the DEC 4000 AXP system appear in the left margin. Shaded areas help you locate components on the front or back of the system. Warning Warnings contain information to prevent personal injury. Caution Cautions provide information to prevent damage to equipment or software.
RBDs and LEDs. If the operating system is up, use the operating system environment diagnostic tools, such as error logs, crash dumps, DEC VET and UETP exercisers, and other log files. System Maintenance Strategy 1–1...
System problems can be classified into the following five categories: 1. Power problems 2. Problems getting to the console 3. Failures reported by the console subsystem 4. Boot failures 5. Failures reported by the operating system Using these categories, you can quickly determine a starting point for diagnosis and eliminate the unlikely sources of the problem.
Page 15
Table 1–1 (Cont.) Recommended Troubleshooting Procedures Diagnostic Description Tools/Resources Reference 3. Failures Reported by the Console Program (Table 1–4) Power-up console screens Power-up Refer to Section 2.2 for information on indicate a failure. screens interpreting power-up self-tests. Console event Refer to Section 2.2 for information on the console event log.
Page 16
DEC OSF/1 Krash Utility. DEC VET or Refer to Section 3.3 for a description UETP of DEC VET, and Section 3.4 for information on running UETP software exercisers. Other log files Refer to Chapter 4 for information on using log files such as SETHOST.LOG...
Table 1–2 Diagnostic Flow for Power Problems Symptom Action Reference No AC power at system Check the power source and power cord. as indicated by AC present LED. AC power is present, but Check the system AC circuit breaker system does not power setting.
Table 1–4 Diagnostic Flow for Problems Reported by the Console Program Symptom Action Reference Power-up screens are Use power-up display and/or OCP LEDs Section 2.2 and displayed, but tests do to determine error. Section 2.1.2 not complete. Console program reports Examine the console event log to check Section 2.2.1 error.
Examine the operating system error log Chapter 4 files to isolate the problem. If the problem occurs intermittently, run Section 3.3 and DEC VET or UETP to stress the system. Section 3.4 Examine other log files, such as SETHOST.LOG, OPCOM.LOG, and OPERATOR.LOG.
1. Hardware installation and acceptance testing. Acceptance testing includes running ROM-based diagnostics. 2. Software installation and acceptance testing. For example, using OpenVMS Factory Installed Software (FIS), and then acceptance testing with DEC VET or UETP. 3. Installation of the remote service tools and equipment to allow a Digital Service Center to dial in to the system.
Page 21
ROM-Based Diagnostics (RBDs) ROM-based diagnostics have significant advantages: • There is no load time. • The boot path is more reliable. • Diagnosis is done in console mode. RECOMMENDED USE: The ROM-based diagnostic facility is the primary means of console environment testing and diagnosis of the CPU, memory, Ethernet, Futurebus+, and SCSI and DSSI subsystems.
Page 22
RECOMMENDED USE: Use DEC VET or UETP as part of acceptance testing to ensure that the CPU, memory, disk, tape, file system, and network are interacting properly. Also use DEC VET or UETP to stress test the user’s environment and configuration by simulating system operation under heavy loads to diagnose intermittent system failures.
DEC 4000 AXP Model 600 Series Information Set The DEC 4000 AXP Model 600 Series Information Set consists of service documentation that contains information on installing and using, servicing and upgrading, and understanding the system. The guide you are reading is part of the set.
Storage and Retrieval System (STARS) STARS is a worldwide database for storing and retrieving technical information. The STARS databases, which contain more than 150,000 entries, are updated daily. Using STARS, you can quickly retrieve the most up-to-date technical information via DSNlink or DSIN. 1.5 Field Feedback Providing the proper feedback to the corporation is essential in closing the loop on any service call.
Section 2.4 describes the boot sequence. 2.1 Interpreting System LEDs DEC 4000 AXP systems have several diagnostic LEDs that indicate whether modules and subsystems have passed self-tests. The power system controller constantly monitors the power supply subsystem and can indicate several types of failures.
2.1.1 Power Supply LEDs The power supply LEDs (Figure 2–1) are used to indicate the status of the components that make up the power supply subsystem. The following types of failures will cause the power system controller to shut down the system: •...
Figure 2–1 Power Supply LEDs AC Circuit Breaker FEU Failure FEU OK DC3 Failure DC3 OK DC5 Failure DC5 OK PSC Failure PSC OK Over Overtemperature Shutdown Fan Failure Disk Power Failure Fault ID Display AC Present LJ-02011-TI0 Power-On Diagnostics and System LEDs 2–3...
Table 2–1 Interpreting Power Supply LEDs Indicator Meaning Action on Error Front End Unit (FEU) AC Present When lit, indicates AC power If AC power is not present, check is present at the AC input the power source and power cord. connector (regardless of circuit If the system will not power up and breaker position).
Page 29
Make sure the air intake is unobstructed and that the room temperature does not exceed maximum requirement as described in the DEC 4000 Site Preparation Checklist. (continued on next page) Power-On Diagnostics and System LEDs 2–5...
Table 2–1 (Cont.) Interpreting Power Supply LEDs Indicator Meaning Action on Error DC–DC Converter (DC3) DC3 OK When lit, indicates that all the DC3 output voltages are within specified tolerances. DC3 Failure When lit, indicates that one of Replace the DC3 converter the output voltages is outside (Chapter 5).
2.1.2 Operator Control Panel LEDs The OCP LEDs (Figure 2–3) are used to indicate the progress and result of self-tests for Futurebus+, memory, CPU, and I/O modules. These LEDs are the primary diagnostic tool for troubleshooting problems getting to the console program.
Refer to Table 2–2 for information on interpreting the OCP LEDs and determining what actions to take when a failure is indicated. Figure 2–4 shows the module locations as they correspond to the LEDs. Table 2–2 Interpreting OCP LEDs Indicator Meaning Action on Error Futurebus+ 6–1...
Figure 2–4 Module Locations Corresponding to OCP LEDs LJ-02052-TI0 2.1.3 I/O Panel LEDs The I/O panel LEDs (Figure 2–5) are used to indicate the status of ThinWire and thickwire (standard) Ethernet fuses. Refer to Table 2–3 for information on interpreting the LEDs and determining what actions to take when a failure is indicated.
2.1.4 Futurebus+ Option LEDs The Futurebus+ option LEDs (Figure 2–6) are used to indicate the progress and result of self-tests for a specific Futurebus+ option. Refer to Table 2–4 for information on interpreting the LEDs and determining what actions to take when a failure is indicated. Figure 2–6 Futurebus+ Option LEDs Fault LJ-02010-TI0...
Storage device LEDs are used to indicate the status of the device. The LEDs for fixed-media storage devices are shown in Figures 2–7 and Figure 2–8. Refer to the DEC 4000 Model 600 Series Owner’s Guide for information on LEDs for the removable-media devices.
Figure 2–7 Fixed-Media Mass Storage LEDs (SCSI) Fast SCSI Fault Local Disk Converter OK Online 3.5-Inch SCSI Fault Local Disk Converter OK Online SCSI Terminator 5.25-Inch SCSI Local Disk Converter OK SCSI Terminator LJ-02486-TI0 Power-On Diagnostics and System LEDs 2–13...
Figure 2–8 Fixed-Media Mass Storage LEDs (DSSI) 3.5-Inch DSSI Fault Local Disk Converter OK Online DSSI Terminator with LED 5.25-Inch DSSI Fault Write Protect Local Disk Converter OK Run/Ready DSSI Terminator with LED LJ-02483-TI0 Table 2–5 Interpreting Fixed-Media Mass Storage LEDs Indicator Meaning Action on Error...
Table 2–5 (Cont.) Interpreting Fixed-Media Mass Storage LEDs Indicator Meaning Action on Error DSSI Terminator When lit, indicates DSSI If the DSSI terminator LED does termination power is present. not light, check the DSSI bus connections for that bus. If bus connections seem secure, the local disk converter module or DC5 converter may need to be replaced...
LJ-02267-TI0 2.2.1 Console Event Log DEC 4000 AXP systems maintain a console event log consisting of status messages received during power-on self-tests. If there are problems during power-up, standard error messages may be embedded in the console event log. To...
cat el >>> Starting console. halt code = 1 PC = 0 initialized idle PCB initializing semaphores test Storage Bus B ncr1, loopback connector attached OR SCSI bus failure, could not acquire bus; Control Lines:ff Data lines:ff ncr1 SCSI bus failure *** Hard Error - Error #800 - Diagnostic Name Device Pass Test Hard/Soft...
Figure 2–11 Flowchart for Troubleshooting Fixed-Media Problems Does the disk drive have power? Check the Disk Power Failure LED on the PSC. LED off LED on Likely LDC failure Check the LDC OK LED on the storage compartment front panel. LED on LED off LDC failure...
Figure 2–12 Flowchart for Troubleshooting Fixed-Media Problems (Continued) Are cables loose or missing? Power down, remove drawer and check all cable connections, reseat drawer and power up. Problems persist Problems solved Cable disconnected Continue Is the storage bus terminated? Check that a terminator is in place. Terminator present Terminator missing Terminator missing...
Table 2–6 Fixed-Media Mass Storage Problems Problem Symptom Corrective Action LDC failure Disk power failure LED on PSC Replace LDC. is on. LDC OK LED on storage compartment front panel is off. Power-up screen reports a failing storage adapter port. Drive failure Fault LED for drive is on Replace drive.
Page 46
Table 2–6 (Cont.) Fixed-Media Mass Storage Problems Problem Symptom Corrective Action Missing or loose Cable: storage device to ID Remove storage drawer and inspect cables panel—Bus node ID defaults to cable connections. zero; online LEDs do not come Flex circuit: LDC to storage interface module—Disk power failure LED on PSC is on;...
Figure 2–13 Flowchart for Troubleshooting Removable-Media Problems Has the drive failed? Check the drive’s fault LED. LED off LED on (steady) Drive failure Continue Are bus node ID plugs improperly set? Check that all drives on the bus have unique bus node ID numbers (no duplicates). Duplicate bus node IDs Configuration rule violation Check that no drive is set to bus node ID 7 (reserved for host ID).
Figure 2–14 Flowchart for Troubleshooting Removable-Media Problems (Continued) Are cables loose or missing? Power down, remove drive and check all cable connections, replace drive and power up. Problems persist Problems solved Cable disconnected Continue Is the storage bus terminated? Check that a terminator is in place. Terminator present Terminator missing Terminator missing...
Table 2–7 Removable-Media Mass Storage Problems Problem Symptom Corrective Action Drive failure Fault LED for drive is on Replace drive. (steady). Duplicate bus Drives with duplicate bus node Correct bus node ID plugs. node ID plugs ID plugs are missing from the (or a missing configuration screen display.
Table 2–7 (Cont.) Removable-Media Mass Storage Problems Problem Symptom Corrective Action I/O module Problems persist after Replace I/O module. failure eliminating the above problem sources. Backplane Replacing the I/O module does Disassemble system and inspect failure not solve problem—the port backplane interconnect cables.
OCP to 0, as shown in Section 6.5. The robust mode setting uses a 9600 console baud rate. 2.3 Power-Up Sequence During the DEC 4000 AXP power-up sequence, the power supplies are stabilized and tested and the system is initialized and tested via the firmware power-on self-tests.
Figure 2–15 AC Power-Up Sequence AC plug is inserted into wall outlet AC circuit breaker is set to on (1) AC power (country-specific voltage) enters FEU module 1. BUS_DIRECT +48 VDC output (always on) immediately FEU creates two +48V outputs: goes to +48 DC inputs on DC5, DC3 and PSC modules 2.
2.3.2 DC Power-Up Sequence DC power is applied to the system with the DC on/off switch on the operator control panel. Figures 2–16 and 2–17 provide a description of the DC power-up sequence. Failures during DC power-up are indicated by the power supply subsystem LEDs. Additional error information is displayed on the PSC Fault ID display.
Figure 2–16 DC Power-Up Sequence DC on/off switch set to on (1) PSC starts DC power-up sequence and status check PSC checks temperature sensor FAILED Failed PSC fault LED is turned on Fans operate at full speed PSC checks overtemperature status (onboard) FAILED - Fans kept running while orderly shutdown is initiated Fan Failure LED is turned on...
Figure 2–17 DC Power-Up Sequence (Continued) PSC waits 30 ms for +5.1 VDC to reach regulation FAILED Output did not reach regulation in time Fans and active DC outputs are turned off Failure LED on DC5 module is turned on PSC latches in shutdown mode DC5 OK LED is turned on PSC commands DC3 to turn on +2.1 VDC output...
2.3.3 Firmware Power-Up Diagnostics After successful completion of AC and DC power-up sequences, the processor performs its power-up diagnostics. These tests verify system operation, load the system console, and test the kernel system, including all boot path devices. These tests are performed as two distinct sets of diagnostics: 1.
The system firmware uses the bootstrap procedure defined by the Alpha AXP architecture and described in the Alpha System Reference Manual. On a DEC 4000 AXP system, bootstrap can be attempted only by the primary processor or boot processor. The firmware uses...
device and optional filename information specified either on the command line or in appropriate environment variables. There are only three conditions under which the boot processor attempts to bootstrap the operating system: 1. The boot command is typed on the console terminal. 2.
The steps leading to the transfer of control to system software may be performed in any order. The final state seen by system software is defined, but the implementation-specific sequence of these steps is not. Prior to beginning a bootstrap, the console must clear any internally pended restarts to any processor. 2.4.2 Loading of System Software The console uses the boot_dev environment variable to determine the bootstrap device and the path to that device.
7. If the bootstrap device list is empty, bootdef_dev or boot_dev are null, and the action is implementation-specific. The console may remain in console I/O mode or attempt to locate a bootstrap device in an implementation-specific manner. The boot_file and boot_osflags environment variables are used as default values for the bootstrap filename and option flags.
In a shared memory system, processors cannot independently load and start system software; bootstrapping is controlled by the primary processor. DEC 4000 AXP systems always select CPU0 as the primary processor. The secondary processor polls a mailbox for a start address. 2.4.5 Boot Devices The supported boot devices shown in Table 2–8 are determined by the console’s...
Section 3.5 describes acceptence testing and initialization procedures. 3.1 Running ROM-Based Diagnostics DEC 4000 AXP ROM-based diagnostics (RBDs), which are part of the console firmware that is loaded from the FEPROM on the I/O module, offer many powerful diagnostic utilities, including the ability to examine error logs from the console environment and run system- or device-specific exercisers.
Page 64
2. Three related commands are used to list system bus FRUs, report the status of RBDs in progress, and report errors: • command (Section 3.1.2) reports system bus FRUs, module show fru part numbers, hardware and software revision numbers, and summary error information.
3.1.1 test command runs firmware diagnostics for the entire system, specified test subsystems, or specific devices. These firmware diagnostics are run in the background. When the tests are successfully completed, the message ‘‘tests done’’ is displayed. If any of the tests fail, a failure message is displayed. If you do not specify an argument with the command, all tests except those test...
Page 66
[scsi] Firmware diagnostics will test the SCSI subsystem, including read-only tests of all SCSI disks and read-write tests for SCSI tape drives. [fbus] Firmware diagnostics will instruct all Futurebus+ modules to perform extended category default self-tests. [memory] Firmware diagnostics will test memory modules present in the system. [ethernet] Firmware diagnostics will test the Ethernet logic.
3.1.2 show fru command reports FRU and error information for the following show fru FRUs based on the serial control bus EEPROM data: • CPU modules • Memory modules • I/O modules • Futurebus+ modules For each of the above FRUs, the slot position, option, part, revision, and serial numbers, as well as any reported symptom-directed diagnostics (SDD) and test-directed diagnostics (TDD) event logs are displayed.
Page 68
Option name (I/O, CPU#, or MEM#) Part number of option Revision numbers (hardware and firmware) Serial number Events logged: SDD: Number of symptom-directed diagnostic events logged by the operating system, or in the case of memory, by the operating system and firmware diagnostics.
3.1.3 show_status command reports one line of information per executing show_status diagnostic. The information includes ID, diagnostic program, device under test, error counts, passes completed, bytes written and read. Many of the diagnostics run in the background and provide information only if an error occurs.
3.1.4 show error command reports error information based on the serial control show error bus EEPROM data. Both the operating system and the ROM-based diagnostics log errors to the serial control bus EEPROMs. This functionality provides the ability to generate an error log from the console environment. A closely related command, (Section 3.1.2), reports FRU and error show fru...
Page 71
Multi-chip (0=no, 1=yes)—indicates that a group of entries are the result of a single error. Event type: 11—DRAM hard-failure 01—Correctable read data (CRD) error 10—Uncorrectable error 00—Other (non-DRAM error) Running System Diagnostics 3–9...
3.1.5 memexer command tests memory by running a specified number of memory memexer exercisers. The exercisers are run in the background and nothing is displayed unless an error occurs. Each exerciser tests all available memory in 2-MB blocks for each pass. To terminate the memory tests, use the command to terminate an individual kill...
Page 73
3.1.6 memexer_mp command tests memory cache coherency in a multiprocessor memexer_mp system by running a specified number of memory exerciser sets. A set is a memory test that runs on each processor checking alternate longwords. The exercisers are run in the background and nothing is displayed unless an error occurs.
3.1.7 exer_read command tests a disk by performing random reads of 2048 bytes exer_read on one or more devices. The exercisers are run in the background and nothing is displayed unless an error occurs. The tests continue until one of the following conditions occurs: 1.
Page 75
Examples: exer_read >>> failed to send command to pkc0.1.0.2.0 failed to send Read to dkc100.1.0.2.0 *** Hard Error - Error #5 - Diagnostic Name Device Pass Test Hard/Soft 31-JUL-1992 exer_kid 00000175 dkc100.1.0.2 14:54:18 Error in read of 0 bytes at location 014DD400 from device dkc100.1.0.2.0 *** End of Error *** >>>...
3.1.8 exer_write command tests a disk by performing random writes on one or exer_write more devices. The exercisers are run in the background and nothing is displayed unless an error occurs. The exer_write tests cause the device to seek to a random block and read a 2048-byte packet of data, write that same data back to the same location on the device, read the data again, and compare it to the data originally read.
Page 77
Examples: exer_write dka0 >>> EXECUTING THIS COMMAND WILL DESTROY DISK DATA OR DATA ON THE SPECIFIED DEVICES Do you really want to continue? [Y/(N)]: failed to send command to pkc0.1.0.2.0 failed to send Read to dkc100.1.0.2.0 *** Hard Error - Error #5 - Diagnostic Name Device Pass Test Hard/Soft 31-JUL-1992...
3.1.9 fbus_diag command is used to start execution of a diagnostic test script fbus_diag onboard a specific Futurebus+ device. fbus_diag comand uses the Futurebus+ standard test CSR interface to initiate commands on specific Futurebus+ devices, waits for tests to complete, and then reports the results to the console.
Page 79
[-cat] (test_group) Specifies the test category to be executed. The possible categories are as follows: • Init: Initialization tests • Extended: Extended tests (default category) • System: System tests • Manual: Manual tests • x: Bit mask of the desired test categories [-opt] (test_option) Specify the Test Start CSR Option field bits to be set.
3.1.10 show_mop_counter command displays the MOP counters for the specified show_mop_counter Ethernet port. Synopsis: show_mop_counter [port_name] Arguments: [port_name] Specifies the Ethernet port for which to display MOP counters: eza0 for Ethernet port 0; ezb0 for Ethernet port 1. Examples: show_mop_counter eza0 >>>...
3.1.11 clear_mop_counter command initializes the MOP counters for the specified clear_mop_counter Ethernet port. Synopsis: show_mop_counter [port_name] Arguments: [port_name] Specifies the Ethernet port for which to initialize MOP counters: eza0 for Ethernet port 0; ezb0 for Ethernet port 1. Examples: clear_mop_counter eza0 >>>...
3.1.12 Loopback Tests Internal and external loopback tests can be used to isolate a failure by testing segments of a particular control or data path. The loopback tests are a subset of the RBDs. 3.1.12.1 Testing the Auxiliary Console Port (exer) Using a loopback connector (29–24795–00) and a form of the exer command, you...
3.1.13 kill and kill_diags commands terminates diagnostics that are currently kill kill_diags executing . • kill command terminates a specified process. • command terminates all diagnostics. kill_diags Synopsis: kill_diags kill [PID . . . ] Arguments: show_status [PID . . . ] The process ID of the diagnostic to terminate.
Table 3–1 (Cont.) Summary of Diagnostic and Related Commands Command Function Reference Extended Testing/Troubleshooting memexer Exercises memory by running a specified number of Section 3.1.5 memory tests. The tests are run in the background. memexer_mp Tests memory in a multiprocessor system by running Section 3.1.6 a specified number of memory exerciser sets.
Page 85
POST is also used to handle two types of error conditions in the drive: • Controller errors are caused by the hardware associated with the controller function of the drive module. A controller error is fatal to the operation of the drive, since the controller cannot establish a logical connection to the host.
Page 86
Use the command to access the local programs listed above. set host -dup Example 3–1 provides an abbreviated example of running DRVTST for a device (Bus node 2 on Bus 0). Caution When running internal drive tests, always use the default (0 = No) in responding to the ‘‘Write/read anywhere on medium?’’...
DEC VET runs on both OpenVMS AXP and DEC OSF/1 operating systems. DEC VET consists of a manager and exercisers that test devices. The DEC VET manager controls these exercisers. DEC VET exercisers test system hardware and the operating system.
3.4 Running UETP The User Environment Test Package (UETP) tool is an OpenVMS AXP software package designed to test whether the OpenVMS AXP operating system is installed correctly. UETP software puts the system through a series of tests that simulate a typical user environment, by making demands on the system that are similar to demands that might occur in everyday use.
Page 89
2. Make sure no user programs are running and no user volumes are mounted. Caution By design, UETP assumes and requests the exclusive use of system resources. If you ignore this restriction, UETP may interfere with applications that depend on these resources. 3.
Press Return after each prompt. After you answer the last question, UETP initiates its entire sequence of tests, which run to completion without further input. The final message should look like the following: ***************************************************** END OF UETP PASS 1 AT 20-JUL-1992 16:30:09.38 ***************************************************** 5.
This command initializes DUA1, and assigns the volume label TEST1 to the disk. All volumes must have unique labels. 3. Mount the disk. For example: $ MOUNT/SYSTEM DUA1: TEST1 This command mounts the volume labeled TEST1 on DUA1. The /SYSTEM qualifier indicates that you are making the volume available to all users on the system.
Tape cartridges must be labeled UETP to be tested. As a safety feature, UETP does not test tape cartridges that have been mounted with the MOUNT command. 3.4.5.1 TLZ06 Tape Drives During the initialization phase, UETP sets a time limit of 6 minutes for a TLZ06 unit to complete the UETTAPE00 test.
Note UETP will not test your Ethernet adapter if DECnet for OpenVMS AXP or another application has the device allocated. Because either DECnet for OpenVMS AXP or the LAT terminal server might also try to use the Ethernet adapter (a shareable device), you must shut down DECnet for OpenVMS AXP and the LAT terminal server before you run the device test phase, if you want to test the Ethernet adapter.
3.4.10 Termination of UETP At the end of a UETP pass, the master command procedure UETP.COM displays the time at which the pass ended. In addition, UETP.COM determines whether UETP needs to be restarted. At the end of an entire UETP run, UETP.COM deletes temporary files and does other cleanup activities.
3.4.12.1 UETP Log Files UETP stores all information generated by all UETP tests and phases from its current run in one or more UETP.LOG files, and it stores the information from the previous run in one or more OLDUETP.LOG files. If a run of UETP involves multiple passes, there will be one UETP.LOG or one OLDUETP.LOG file for each pass.
2. Bring up the operating system. 3. Run DEC VET or UETP to test that the operating system is correctly installed. Refer to Section 3.3 for information on DEC VET. Refer to Section 3.4 for instructions on running UETP.
Section 4.2 describes the entry format used by the ERF/UERF error formatters. • Section 4.3 describes how to translate the error log information using the OpenVMS AXP and DEC OSF/1 error formatters. • Section 4.4 describes how to interpret the system error log to isolate the failing FRU.
Table 4–1 DEC 4000 AXP Fault Detection and Correction Component Fault Detection/Correction Capability KN430 Processor Module DECchip 21064 micropro- Error Detection and Correction (EDC) logic. For all data cessor entering the 21064 microprocessor, single bits are checked and corrected; for all data exiting the 21064 microprocessor, the appropriate check bits are generated.
Page 99
The causes for each of the machine check/interrupts are as follows. The system control block (SCB) vector through which PALcode transfers control to the operating system is shown in parentheses. Processor Machine Check (SCB: 670) Processor machine check errors are fatal system errors and immediately crash the system.
Nut transactions consist of a command/address cycle and two dummy data cycles for which no data is transferred. For more information, refer to the DEC 4000 Model 600 Series Technical Manual. 4.2 Error Logging and Event Log Entry Format The OpenVMS AXP and DEC OSF/1 error handlers can generate several entry types.
Each entry consists of an operating system header, kernel event frame, several device frames, and an end frame. Most entries have a PAL-generated logout frame, and may contain registers for a second CPU, memory (0–3), and I/O. Figure 4–1 shows the general error log format used by the ERF/UERF error formatters.
Both ERF and UERF provide bit-to-text translation for the kernel event frame. Section 4.3.1 summarizes the commands used to translate the error log information for the OpenVMS AXP operating system. Section 4.3.2 summarizes the commands used to translate the error log for the DEC OSF/1 operating system. 4.3.1 OpenVMS AXP Translation The kernel error log entries are translated from binary to ASCII using the ANALYZE/ERROR command.
Example 4–1). Otherwise, nothing is shown in the translation column. Section 4.4.9 provides a sample ERF-generated error log. 4.3.2 DEC OSF/1 Translation Error log information is written to /var/adm/binary.errlog. Use the following command to save the error log information by copying it to another file: $ cp /var/adm/binary.errlog /tmp/errors_upto_today...
There are eight possible notes, Note 1–Note 8. Each note provides a synopsis of the problem and additional information to consider for analysis. Section 4.4.9 provides a sample ERF-generated error log. Section 4.4.10 provides a sample UERF-generated error log. Table 4–2 Error Field Bit Definitions for Error Log Interpretation Error Field Bits U/ERF Bit-to-Text Definition Module/Notes...
Page 105
Table 4–2 (Cont.) Error Field Bit Definitions for Error Log Interpretation Error Field Bits U/ERF Bit-to-Text Definition Module/Notes Quadword 1, CPU1-Detected W1-Byte-0, CPU Machine Check Related Errors <0> C3_1_CA_NOACK CPU_1 Bus Command No-Ack CPU_1, Note 1 <1> C3_1_WD_NOACK CPU_1 Bus Write Date No-Ack CPU_1, Note 2 <2>...
Page 106
Table 4–2 (Cont.) Error Field Bit Definitions for Error Log Interpretation Error Field Bits U/ERF Bit-to-Text Definition Module/Notes W2-Byte-1, Event Correlation Flags <0> C3_MEM_R_ERROR CPU error caused by memory Note 4 <1> IO_MEM_R_ERROR I/O error caused by memory <2> C3_OCPU_ADD_ CPU error caused by other CPU Note 4 MATCH...
Page 107
Table 4–2 (Cont.) Error Field Bit Definitions for Error Log Interpretation Error Field Bits U/ERF Bit-to-Text Definition Module/Notes Quadword 1 Responder Errors W0-Byte-0, Command/Address Parity Error Detected <0> C3_0_CA_PAR CPU_0 Bus Command/Add Parity Error CPU_0, Note 1 <1> C3_1_CA_PAR CPU_1 Bus Command/Add Parity Error CPU_1, Note 1 <2>...
Analysis: Note All bus nodes check command/address parity during the command/address cycle. • _CA_NOACK errors without respective command/address parity errors are most likely caused by problems in the bus commander, such as programming errors, address generation, and the like. You should consider the context of the error;...
Examine the commander’s command trap register to identify the respective responder. • _WD_NOACK errors with the responder reporting _WD_PAR errors could indicate a failure with either device. • _WD_PAR errors without respective _WD_NOACK would require two failures to occur: 1. Bad data received by responder 2.
The failing module is the CPU reporting the failure, except: • If EV_SYN_1F (‘‘CPU reported syndrome 0x1f’’) or C3_SYN_1F (‘‘C3 reported syndrome 0x1f’’) bits are set in the error field, known bad data was supplied to the CPU from another source (either memory or the other CPU). –...
4.4.7 Note 7: Futurebus+ Mailbox Access Parity Error Synopsis: A data parity error occurred during reading of data from a Futurebus+ option via a mailbox operation. Analysis: The failing module could be either the I/O module or one of the Futurebus+ options.
Page 113
Example 4–1 ERF-Generated Error Log Entry Indicating CPU Corrected Error V M S SYSTEM ERROR REPORT COMPILED 17-NOV-1992 10:54:57 PAGE ******************************* ENTRY 1. ******************************* ERROR SEQUENCE 1. LOGGED ON: CPU_TYPE 00000002 DATE/TIME 21-SEP-1992 12:00:24.83 SYS_TYPE 00000002 SYSTEM UPTIME: 0 DAYS 00:10:04 SCS NODE: DSSI3 VMS T1.0-FT4 CACHE ERROR KN430...
The low quadword of the error field register, ERROR FLAG1 ( ), has two bits set. The corresponding bit-to-text translations may not be provided for some versions of DEC OSF/1. The high quadword of the error field register, ERROR FLAG2 ( ), has no bits set.
Page 115
Example 4–2 (Cont.) UERF-Generated Error Log Entry Indicating CPU Error FRAME REVISION x0001 SCB VECTOR x0670 FRU 1 x0000 FIELD NOT VALID FRU 2 x0000 FIELD NOT VALID SEVERITY x0001 SEVERITY FATAL CPU ID x0000 ERROR COUNT x0001 THRESHOLD FOR FAIL C x0000 FIELD NOT VALID FAIL CODE...
Repairing the System This chapter describes the removal and replacement procedures for DEC 4000 AXP systems. • Section 5.1 gives general guidelines for FRU removal and replacement. • Section 5.2 covers FRUs accessed at the front of the system. •...
Page 118
Refer to the DEC 4000 AXP Model 600 Illustrated Parts Breakdown: Mass Storage Device (EK–MS430–IP) and DEC 4000 AXP Model 600 Illustrated Parts Breakdown: Series Enclosure (EK–EN430–IP) if you need a more detailed illustration. Caution Only qualified service personnel should remove or install FRUs.
Page 119
Warning The following warning symbols appear on the system enclosure. Please review their definitions. Hazardous voltages are present within the front end unit (AC power supply). Do not access unless properly trained. Before you access this unit, remove AC power by pressing the AC circuit breaker to the Off (0) position, and unplug the power cord.
5.2.3 Fixed-Media Storage Refer to Figures 5–3 through 5–7 for removal and replacement information. For more detailed cabling illustrations refer to the DEC 4000 AXP Model 600 Illustrated Parts Breakdown: Mass Storage Device (EK–MS430–IP). 5.2.3.1 3.5-Inch Fast-SCSI Disk Drives (RZ26, RZ27, RZ35) Refer to Figure 5–3 and Figures 5–5 and 5–6.
Part Number Name 54–21135–01 Module, hard disk interface card 54–21191–01 RF35/RZ35 remote front panel 54–21835–01 Termination board, SCSI RZXX–MY 3.5-inch drive with tray-specific cable Removal and Replacement Tips When adding or replacing 3.5-inch SCSI disk drives you must remove the drive’s three resistor packs and two terminator power jumpers (Figure 5–5) before installing the drive to its storage tray.
installing the drive to its storage tray. Failure to do so will result in problems with the SCSI bus. Refer to Figure 5–6 to determine the proper placement of drives within the storage tray. The position of the drive corresponds to the bus node ID plugs as shown.
5.2.3.5 3.5-Inch DSSI Disk Drive Refer to Figures 5–4 and 5–6. Part Number Name BA6FE–MY Storage tray for up to four 3.5-inch DSSI disk ISEs 70–28752–02 Cable assembly (includes 17–03408–01 cable, 50-conductor) 17–03057–01 Harness assembly, 2-conductor (local disk converter module to storage interface card) 17–03401–01 Harness assembly, 4-conductor (local disk converter module to...
Cable assembly, 50-conductor, interface card to bulkhead 12–29258–01 Terminator, DSSI 5.2.4 Removable-Media Storage (Tape and Compact Disc) For information on removal and replacement of removable-media drives, refer to the DEC 4000 AXP Model 600 Series Options Guide (EK–KN430–OG). 5.2.4.1 SCSI Bulkhead Connector Part Number Name 70–29427–01 Cable/bracket assembly with 17–03182–01 cable...
Figure 5–1 SCSI Continuity Card Placement Dual Half-Height SCSI Drives Full-Height SCSI Drives Continuity Card Full-Height SCSI Drives Dual Half-Height SCSI Drives Full-Height Full-Height SCSI Drives SCSI Drives MLO-009431 5.2.5 Fans Two fans (fan number 3 and 4) are accessed at the front of the system. Repairing the System 5–9...
Part Number Name 12–36202–01 17–03111–01 Fan power harness Figure 5–2 Front FRUs Vterm Module Operator Control Panel SCSI Terminator 4 8 V > 2 4 Fixed-Media Mass Storage Assemblies DSSI Terminator Tray Release Latch Fan Assembly Cable Guide (front) Removable-Media Mass Storage Assembly SCSI Out...
Figure 5–3 Storage Compartment with Four 3.5-inch Fast-SCSI Drives (RZ26, RZ27, RZ35) Local Disk Converter Pull handle to Half-Height Half-Height remove connector Fast SCSI Drive Fast SCSI Fast SCSI from drive Bezel Assembly Drive Assembly Terminator SCSI ID Module 4 8 V >...
Figure 5–5 3.5-Inch SCSI Drive Resistor Packs and Power Termination Jumpers SCSI Drive Resistor Packs (3)* Power Termination Jumpers (2)* * Must be removed before drive is installed to storage tray. LJ-02268-TI0 Repairing the System 5–13...
Figure 5–6 Position of Drives in Relation to Bus Node ID Numbers Fixed-Media Storage Tray Local Disk Bus Node ID Bus Node ID Converter Bus Node ID Bus Node ID Tray Release Latch Front Panel Bulkhead Connector LJ-02269-TI0 5–14 Repairing the System...
5.3 Rear FRUs The following sections contain the part numbers of the FRUs accessed at the rear of the system. Text is provided for additional procedures or precautions. Refer to Figure 5–8 for the location of rear FRUs. 5.3.1 Modules (CPU, Memory, I/O, Futurebus+) Part Number Name B2001–AA...
>>> set bootdef_dev eaz0 >>> set boot_osflags 0,1 >>> 5.3.2 Ethernet Fuses Ethernet fuses are located on the I/O module. Refer to Figure 5–9 for the specific fuse location. Part Number Name 12–09159–00 0.5 A ThinWire Ethernet fuse (F1, F3) 12–10929–08 1.5 A thickwire Ethernet fuse (F2, F4) 5.3.3 Power Supply...
Figure 5–8 Rear FRUs Futurebus+ Module Memory Module Module Module Front End Unit Power System Controller Converter Converter AC Cord Cable Guide Fan Assembly Base Unit Interlock (rear)
5.4 Backplane Refer to Figures 5–10 and 5–11. Part Number Name 70–28747–01 Backplane assembly 17–03340–01 Cable assembly, 100-conductor backplane-to-backplane (2) 17–03341–01 Cable assembly, 40-conductor, backplane-to-backplane Removal and Replacement Tips To remove the backplane: 1. Unseat all modules (CPU, memory, I/O, and power supply modules) from the rear backplane.
Figure 5–11 Removing Backplane Front Chassis Rear Chassis Storage Frame Card Cage Backplane SCSI Assembly Assembly Backplane Continuity Cards Assembly Assembly Screws (upper and lower) Screw locations are the same on the Cable Guide (front) other side of the system. LJ-01794-TI0 5.5 Repair Data for Returning FRUs When you send back an FRU for repair, staple the error log to the fault tag or...
Section 6.5 describes how to set console line baud rates. 6.1 Functional Description The DEC 4000 AXP system is a department-level system that uses the custom VLSI CPU chip (DECchip 21064 microprocessor) based on the Alpha APX RISC architecture. The system is housed in a BA640 enclosure and includes the following components: •...
Page 140
• Four fixed-media storage compartments (each can hold up to 4 half-height drives or 1 full-height drive). • A removable-media storage compartment (can hold 2 full-height or up to 4 half-height devices) • Four fans • Backplane assembly (includes system backplane: serial control bus, Futurebus+, and power bus;...
Figure 6–1 System Block Diagram Power Subsystem To Outlet Front Power System Unit Contr Serial Control Bus Memory 3 Memory 2 CPU 1 Memory 1 CPU 0 Memory 0 Operator Control 64, 128 MB Panel System Bus Serial Control Bus DSSI/SCSI Bus A Ethernet Port 0 DSSI/SCSI Bus B...
Figure 6–2 System Backplane System Backplane Storage Backplane Fixed-Media Side Removable-Media Side Serial Control Bus Local I/O Buses from I/O Module Vterm and OCP Futurebus+ DSSI/SCSI J6 J4 J2 Bus A SCSI-2 Bus E System Bus DSSI/SCSI Bus B Futurebus+ Modules Memory CPUs DSSI/SCSI...
Figures 6–3 and 6–4 show the front and rear of the BA640 enclosure. Figure 6–3 BA640 Enclosure (Front) Air Plenum DC On/Off Switch Operator Control Panel Cable Guide Base Unit, Contains Fans 3 and 4 Fixed-Media Mass Storage Compartments Removable-Media Mass Storage Compartment MLO-007714 System Configuration and Setup 6–5...
Figure 6–4 BA640 Enclosure (Rear) Serial and Model Number Label AC Circuit Breaker Cable Guide Base Unit, Contains Fans 1 and 2 Card Cage Power Subsystem MLO-007715 6–6 System Configuration and Setup...
6.1.1 System Bus The system bus interconnects the CPUs, memory modules, and I/O module. The I/O module provides access to basic I/O functions (network, storage devices, and console program). The I/O module also is the adapter to the I/O expansion bus, Futurebus+.
Figure 6–5 CPU Block Diagram To memory module, I/O module, Serial Control Bus power supply, and operator control panel Serial Control Bus PROC OCS EEPROM Clock Detect Serial THIS IS SCALED AT 78/100 INV_ADR<12:5> Addr<33:5> Addr <33:5> DATA_A<4> DECchip 21064 Microprocessor INV_ADR<12:5>...
Page 147
CPU Features Each CPU has the following features: • DECchip 21064 processor chip (approximately 100 MIPS, 20 MFLOPS) • 1-MB direct-mapping backup cache (physical write-back cache, 32-byte block size) • Interface to system bus (128 bits wide) • System bus arbiter •...
6.1.1.2 Memory MS430 memory modules provide high-bandwidth, low-latency program and data storage elements for DEC 4000 AXP systems. Up to four memory modules can be configured in a DEC 4000 AXP system. The MS430 memory modules are designed to be compatible with two generations of DRAM technology—256K x 4 and 1-MB x 4 parts—and are configured with...
Table 6–1 Memory Features Feature Description Error detection and correction Improves data reliability and integrity by performing (EDC) logic detection and correction of all single-bit errors and the most prevalent forms of 2-bit, 3-bit, and 4-bit errors in the DRAM array. Write transaction buffers Improves total memory bandwidth by allowing write transactions to ‘‘dump and run.’’...
Figure 6–6 MS430 Memory Block Diagram To memory modules,I/O module, Serial Control Bus power supply and operator control panel Serial Control Bus EEPROM THIS FIGURE IS SCALED AT 85/100 BANK 3 (256 Data + 24 EDC Bits) BANK 2 (256 Data + 24 EDC Bits) BANK 2 (256 Data + 24 EDC Bits) BANK 0 (256 Data + 24 EDC Bits) DRAM...
6.1.1.3 I/O Module The KFA40 I/O module contains the base set of necessary I/O functions and is required in all systems. Figure 6–7 provides a block diagram of the I/O module. I/O module functions include: • Four SCSI-2/DSSI buses for fixed-media devices Note Each of the 4 fixed-media buses may operate as a SCSI-2 bus or a DSSI bus.
Figure 6–7 I/O Module Block Diagram To memory module, I/O module, Serial Control Bus power supply, and operator control panel Serial Bus Controller Console Serial Auxiliary EEPROM Line Unit Line Unit THIS IS SCALED 95/100 Futurebus+ Control FEPROM To Futurebus+ Toy Clock SCSI/DSSI Bus A...
6.1.2 Serial Control Bus The serial control bus is a two-conductor serial interconnect bus that is independent of the system bus. The serial control bus connects the following modules: • CPUs • I/O module • Memory modules • Power system controller (PSC) •...
FRU data: - show FRU - show error User LJ-02064-TI0 6.1.3 Futurebus+ DEC 4000 AXP systems implement Futurebus+ as the I/O bus. Features of Futurebus+ include: • IEEE open standard • 32- or 64-bit, multiplexed address and data bus •...
6.1.4 Power Subsystem The power subsystem is a universal supply that is designed to operate in all countries. Power for the backplane assembly is provided by the centralized power source. Fixed-media storage devices are powered by local disk converters (LDCs) included in each storage compartment.
6.1.5 Mass Storage System mass storage is supported by SCSI-2 and DSSI adapters that reside on the I/O module. Each SCSI-2/DSSI bus is architecturally limited to eight devices, including host adapter. 6.1.5.1 Fixed-Media Compartments Four DSSI/SCSI-2 adapters support the four fixed-media storage compartments (A–D) (Figure 6–10).
Figure 6–10 Fixed-Media Storage Storage Backplane Fixed-Media Side DSSI/SCSI Bus A DSSI/SCSI Bus B DSSI/SCSI Bus C DSSI/SCSI Bus D Fixed-Media Mass Storage Compartments LJ-02293-TI0 Fixed-Media Configuration Rules • For each SCSI/DSSI bus, do not duplicate bus node ID numbers for the storage devices.
• Any one of the four fixed-media compartments can be either SCSI or DSSI, but drives of both types can never be mixed on the same bus. If SCSI devices are chosen, all devices in the mass storage compartment must be SCSI, and external drives connected to that compartment must also be SCSI.
Figure 6–11 Removable-Media Storage Storage Backplane Removable-Media Side J6 J4 J2 SCSI-2 Bus E SCSI Bus Disconnect J7 J5 J3 SCSI Bus Disconnect SCSI-2 Output Removable-Media Mass SCSI continuity cards required Storage Compartment here unless connector is used by half-height devices. LJ-02270-TI0 Removable-Media Configuration Rules •...
• Do not duplicate bus node ID numbers for your storage devices. For Bus E, you can have only one storage device identified as bus node 0, one storage device as 1, and so on. • By convention, storage devices in the removable-media storage compartment are numbered in increasing order from left to right, top to bottom, beginning with zero.
Figure 6–12 Sample Power Bus Configuration System Expander 1 Expander 2 LJ-02488-TI0 Table 6–2 Power Control Bus Connector Function The main out (MO) connector sends the power control bus signal to the expander. One end of a power bus cable is connected here; the other end is connected to the secondary in (SI) connector of an expander power supply.
6.2 Examining System Configuration Several console commands are available for examining system configuration: • (Section 6.2.1)—Displays the buses on the system and the show config devices found on those buses. • show device (Section 6.2.2)—Displays the devices and controllers in the system.
Figure 6–13 Device Name Convention dka0.0.0.0.0 Bus Number: 0 LBus; 1 Futurebus+ Slot Number: 0-4 SCSI/DSSI; 6, 7 Ethernet; 2-13 Futurebus+ nodes Channel Number: Used for multi-channel devices. Bus Node Number: Bus Node ID (from bus node ID plug) Device Unit Number: Unique device unit number (MSCP Unit Number) For Futurebus+ modules, node number, 0 or 1 Storage Adapter ID:...
6.2.3 show memory command displays information for each memory module in the show memory system. Synopsis: show memory Examples: show memory >>> Module Size Base Addr Intlv Mode Intlv Unit ------ ----- --------- ---------- ---------- Not Installed Not Installed Not Installed 128MB 00000000 1-Way...
Table 6–3 Environment Variables Set During System Configuration Variable Attributes Function auto_action NV,W The action the console should take following an error halt or powerfail. Defined values are: BOOT—Attempt bootstrap. HALT—Halt, enter console I/O mode. RESTART—Attempt restart. If restart fails, try boot.
Page 169
flags. The default value when the system is shipped is NULL. The following parameters are used with the DEC OSF/1 operating system: Autoboot. Boots /vmunix from bootdef_dev, goes to multiuser mode. Use this for a system that should come up automatically after a power failure.
Page 170
Synopsis: set [-default] [-integer] -[string] envar value show envar Arguments: envar The name of the environment variable to be modified. value The value that is assigned to the environment variable. This may be an ASCII string. Options: -default Restores variable to its default value. -integer Creates variable as an integer.
6.4 Setting and Examining Parameters for DSSI Devices For a tutorial on DSSI parameters and their function, refer to Section 6.4.3. The following console commands are used in setting and examining DSSI device parameters. • show device du pu (Section 6.4.1)—Displays information for each DSSI device on the system (du specifies drives, pu specifies storage adapters).
dua0.0.0.0.0 Bus Number: 0 LBus; 1 Futurebus+ Slot Number: 0-4 SCSI/DSSI; 6, 7 Ethernet; 2-13 Futurebus+ nodes Channel Number: Used for multi-channel devices. Bus Node Number: Bus Node ID (from bus node ID plug) Device Unit Number: Unique device unit number (MSCP Unit Number) Storage Adapter ID: One-letter storage adapter designator (A,B,C,D, or E) Driver ID:...
Page 173
Options: [-i] Selective interactive mode, set all parameters. [-n] Set device node name, NODENAME (alphanumeric, up to 6 characters). [-a] Set device allocation class, ALLCLASS. [-u] Set device unit number, UNITNUM. [-sn] Set node name (NODENAME) for all DSSI drives on the system to either RFhscn or TFhscn, where: h is the device hose number (0) s is the device slot number (0–3)
Page 175
ALLCLASS The ALLCLASS parameter determines the device allocation class. The allocation class is a numeric value from 0–255 that is used by the OpenVMS AXP operating system to derive a path-independent name for multiple access paths to the same device. The ALLCLASS firmware parameter corresponds to the OpenVMS AXP IOGEN parameter ALLOCLASS.
ALLCLASS is the allocation class for the system and devices, and u is a unique unit number. For example, $1$DIA0. With DEC 4000 AXP systems, you can fill multiple DSSI buses: buses A–D (slot numbers 0–3). Each bus can have up to seven DSSI devices (bus nodes 0–6).
In the following example, the allocation class will be set to 1, the devices for Bus A (in the DEC 4000 AXP system) will be assigned new unit numbers (to avoid the problem of duplicate unit numbers), and the system disk will be assigned a new node name.
Figure 6–15 Sample DSSI Buses for an Expanded DEC 4000 AXP System System 3 2 1 0 Expander DSSI Cable Bus A Bus B DSSI Terminator Locations LJ-02065-TI0 6.5 Console Port Baud Rate Two serial console ports are provided on the I/O module: •...
6.5.1 Console Serial Port The baud rate for the console serial is set at the factory to 9600 bits per second. Most Digital terminals are also shipped with a baud rate of 9600. You can select a baud rate for the console serial port using the volatile environment variable, tta0_baud.
6.5.2 Auxiliary Serial Port The baud rate for the auxiliary serial port is set via the nonvolatile environment variable, tta1_baud. Allowable values are 600, 1200, 2400, 4800, 9600, and 19200. Use the command to assign values to the tta1_baud environment variable.
Environment Variables All supported environment variables are listed in Table A–1. Table A–1 Environment Variables Variable Attributes Function Alpha AXP SRM-Defined Environment Variables Reserved auto_action NV,W The action the console should take following an error halt or powerfail. Defined values are: BOOT—Attempt bootstrap.
Page 184
Table A–1 (Cont.) Environment Variables Variable Attributes Function Alpha AXP SRM-Defined Environment Variables boot_file NV,W The default filename used for the primary bootstrap when no filename is specified by the boot command. The default value when the system is shipped is NULL. booted_file The filename used for the primary bootstrap during the last boot.
Page 185
flags. The default value when the system is shipped is NULL. The following parameters are used with the DEC OSF/1 operating system: Autoboot. Boots /vmunix from bootdef_dev, goes to multiuser mode. Use this for a system that should come up automatically after a power failure.
Page 186
Table A–1 (Cont.) Environment Variables Variable Attributes Function Alpha AXP SRM-Defined Environment Variables boot_reset NV,W Indicates whether a full system reset is performed in response to an error halt or boot command. Defined values and the action taken are: OFF—warm boot, no full reset is performed. ON —cold boot, a full reset is performed.
Page 187
Table A–1 (Cont.) Environment Variables Variable Attributes Function Alpha AXP SRM-Defined Environment Variables language NV,W The default language to display critical system messages. 00 none (cryptic) 30 Dansk 32 Deutsch 34 Deutsch (Schweiz) 36 English (American) 38 English (British/Irish) 3A Espanol 3C Francais 3E Francais (Canadian) 40 Francais (Suisse Romande)
Page 188
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables cpu_enabled A bit mask indicating which processors are enabled to run (leave console mode). If this variable is not defined, all available processors are considered enabled. d_bell Specifies whether or not to bell on error if error is detected.
Page 189
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables d_group Specifies the diagnostic group to be executed. FIELD (default) Other diagnostic group string (up to 32 characters) d_harderr Specifies the action taken following hard error detection. CONTINUE HALT (default) LOOP d_oper Specifies whether or not an operator is present.
Page 190
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables d_softerr Specifies the action taken following soft error detection. CONTINUE (default) HALT LOOP d_startup Specifies whether or not to display the diagnostic startup message. OFF (default)—Disables the startup message. ON—Enables the startup message.
Page 191
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables exdep_location Specifies the location referenced by the last examine deposit command. exdep_size Specifies the data size referenced by the last examine deposit command. exdep_space Specifies the address space referenced by the last examine deposit command.
Page 192
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables ez*0_def_ginetaddr Supplies the initial value for ez*0_ginetaddr when the interface’s internal Internet database is initialized from NVRAM (ez*0_inet_init is set to ‘‘nvram’’). ez*0_def_inetaddr Supplies the initial value for ez*0_inetaddr when the interface’s internal internet database is initialized from NVRAM (ez*0_inet_init is set to ‘‘nvram’’).
Page 193
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables ez*0_inet_init Determines whether the interface’s internal Internet database is initialized from NVRAM or from a network server (via the BOOTP protocol). Legal values are ‘‘nvram’’ and ‘‘bootp’’; default is ‘‘bootp.’’...
Page 194
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables ez*0_lp_msg_node Specifies the number of messages originally sent to each node. ez*0_mode Specifies the value for the SGEC mode when the device is started. This value is a mirror of CSR6. It can be different from device to device.
Page 195
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables ez*0_rm_boot_ Sets the MOP Version 4 boot message password passwd for the Ethernet port, either eza0 or ezb0. This password should be entered in hexadecimal in the form ‘‘01-longword-longword,’’ for instance, ‘‘01- 01234567-89abcdef.’’...
Page 196
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables ferr1 Quadword of error information that Futurebus+ modules can store. ferr2 Quadword of error information that Futurebus+ modules can store. fis_name Specifies a string indicating the Factory Installed Software. Key to variable attributes: NV - Nonvolatile.
Page 197
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables interleave Specifies the memory interleave configuration for the system. The value must be one of: ‘‘default,’’ ‘‘none,’’ or an explicit interleave list. The syntax for specifying the configuration is: 0,1,2,3—Indicates the memory module (or slot) numbers.
Page 198
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables ncr*_setup Here ‘‘*’’ may be 0, 1, 2, 3, or 4, corresponding to the storage bus adapters A, B, C, D, or E, respectively. Four bus mode parameters are associated with ncr*_setup: AUTO # Automatically selects SCSI or DSSI...
Page 199
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables Key to variable attributes: NV - Nonvolatile. The last value saved by system software or set by console commands is preserved across system initializations, cold bootstraps, and long power outages. W - Warm nonvolatile.
Page 200
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables Specifies the versions of OpenVMS and OSF/1 PALcode in the firmware. For instance, OpenVMS PALcode X5.12B, OSF/1 PALcode X1.09A. screen_mode Specifies whether or not the power-up screens or console event log are displayed during power-up. ON (default;...
Page 201
Table A–1 (Cont.) Environment Variables Variable Attributes Function System-Dependent Environment Variables tta*_baud Here ‘‘*’’ may be 0 or 1, corresponding to the primary console serial port, tta0 or the auxiliary console serial port, tta1. Specifies the baud rate of the primary console serial port, tta0. Allowable values are 600, 1200, 2400, 4800, 9600, and 19200.
Power System Controller Fault Displays The microprocessor in the PSC reports the fault conditions listed in Table B–1 on the Fault ID display. Table B–1 Power System Controller Fault ID Display Fault ID Display (Hex) Meaning PSC Self-Test Faults During AC Power-Up F + PSC fault LED on PSC bias supply not okay E + PSC fault LED on...
Page 204
Table B–1 (Cont.) Power System Controller Fault ID Display Fault ID Display (Hex) Meaning PSC Self-Test Faults During AC Power-Up Normal, PSC passed AC power on (continued on next page) B–2 Power System Controller Fault Displays...
Page 205
Table B–1 (Cont.) Power System Controller Fault ID Display Fault ID Display (Hex) Meaning PSC Module Faults F + PSC fault LED on PSC bias supply failed (NMI occurred) F + PSC fault LED on Unimplemented opcode interrupt occurred (invalid instruction) F + PSC fault LED on Software trap interrupt occurred (F7 instruction executed)
Page 206
Table B–1 (Cont.) Power System Controller Fault ID Display Fault ID Display (Hex) Meaning PSC Module Faults E029 + PSC fault LED on Masked IRQ16 occurred (FEU POWER became okay) E030 + PSC fault LED on Masked IRQ29 occurred (unused FEU signal) E031 + PSC fault LED on Masked IRQ30 occurred (unused FEU signal) E032 + PSC fault LED on...
Page 207
Table B–1 (Cont.) Power System Controller Fault ID Display Fault ID Display (Hex) Meaning DC–DC Converter Faults E100 Delta overvoltage fail between +5v and +3v DC5, DC3 converters E110 2.1V converter—out of regulation, low E111 2.1V converter—out of regulation, high E112 2.1V converter—under voltage E113...
Page 208
Table B–1 (Cont.) Power System Controller Fault ID Display Fault ID Display (Hex) Meaning FEU Module Faults E200 SWITCHED 48 okay before enabling E201 Fan converter operating before enabling E202 HVDC is okay, but POWER is not okay (contradictory status) E204 DIRECT 48 not okay and POWER is okay (IRQ18)
Worksheet for Recording Customer Environment Variable Settings When replacing the I/O module, use Table C–1 to record the customer’s nonvolatile environment variable settings. After you install the new I/O module, you can restore the customer’s settings. Table C–1 Nonvolatile Environment Variables Environment Variable Factory Default...
Page 211
The process by which the system boots automatically. auxiliary serial port The EIA 232 serial port on the I/O module of the DEC 4000 AXP system. This port provides asynchronous communication with a device, such as a modem. availability The amount of scheduled time that a computing system provides application service during the year.
Page 212
backup cache A second, very fast memory that is used in combination with slower large-capacity memories. bandwidth Bandwidth is often used to express ‘‘high rate of data transfer’’ in an I/O channel. This usage assumes that a wide bandwidth may contain a high frequency, which can accommodate a high rate of data transfer.
Page 213
An acronym for command, control, and communication chip. On the DEC 4000 AXP system, the ASIC gate array chip located on the CPU module. This chip contains CPU command, control, and communication logic, as well as the bus interface unit for the processor module.
Page 214
cache See cache memory. cache block The fundamental unit of manipulation in a cache. Also known as cache line. cache interference The result of an operation that adversely affects the mechanisms and procedures used to keep frequently used items in a cache. Such interference may cause frequently used items to be removed from a cache or incur significant overhead operations to ensure correct results.
Page 215
See computer interconnect. CISC Complex instruction set computer. An instruction set consisting of a large number of complex instructions that are managed by microcode. Contrast with RISC. clean In the cache of a system bus node, refers to a cache line that is valid but has not been written.
Page 216
concurrency Simultaneous operations by multiple agents on a shared object. conditional invalidation Invalidation of a cached location based upon a set of conditions, which are the state of other caches, or the source of the information causing the invalidate. console mode The state in which the system and the console terminal operate under the control of the console program.
Page 217
A bus used to carry signals between two or more components of the system. D-bus On the DEC 4000 AXP system, the bus between the 21064 CPU chip and the ‘‘D-bus micro’’ and the serial ROMs. D-cache Data cache. A high-speed memory reserved for the storage of data. Contrast with I-cache.
Page 218
DEC VET Digital’s DEC Verifier and Exerciser Tool. DEC VET is a multipurpose system maintenance tool that performs exerciser-oriented maintenance testing. direct-mapping cache A cache organization in which only one address comparison is needed to locate any data in the cache, because any block of main memory data can be placed in only one possible position in the cache.
Page 219
DSSI VMScluster A VMScluster system that uses the DSSI bus as the interconnect between DSSI disks and systems. DUP server The Diagnostic Utility Program (DUP) server is a firmware program on-board DSSI devices that allows a user to set host to a specified device in order to run internal tests or modify device parameters.
Page 220
Compartments that house nonremovable storage media. front end unit (FEU) One of four modules in the DEC 4000 AXP system power supply. The FEU converts alternating current from a wall plug to 48 VDC that the rest of the power subsystem can use and convert.
Page 221
(IPR) A register internal to the CPU chip. KN430 CPU The CPU module used by DEC 4000 AXP Model 600 series systems. The KN430 CPU modeule is based on the DECchip 21064 microprocessor. LAN (local area network) A network that supports servers, PCs, printers, minicomputers, and mainframe computers that are connected over limited distances.
Page 222
latency The amount of time it takes the system to respond to an event. See local disk converter. Light-emitting diode. A semiconductor device that glows when supplied with voltage. local disk converter (LDC) Refers to modules that regulate voltages for fixed-media storage devices. An LDC module is located in each of the fixed-media storage compartments (A–D), provided that the compartment is not storageless.
Page 223
mass storage device An input/output device on which data is stored. Typical mass storage devices include disks, magnetic tapes, and floppy disks. memory interleaving The process of assigning consecutive physical memory addresses across multiple memory controllers. Improves total memory bandwidth by overlapping system bus command execution across two or four memory modules.
Page 224
The data or register upon which an operation is performed. operator control panel The panel on the top right side of the DEC 4000 AXP system that contains the power, Reset, and Halt switches and system status lights. page size A number of bytes, aligned on an address evenly divisible by that number, which a system’s hardware treats as a unit for virtual address mapping, sharing,...
Page 225
(PSC) One of four units in the DEC 4000 AXP power supply subsystem. The H7851AA PSC monitors signals from the rest of the system including temperature, fan rotation, and DC voltages, as well as provides power-up and power-down sequencing to the DC-DC converters and communicates with the system CPU across the serial control bus.
Page 226
processor machine check Processor machine checks indicate that a processor internal error was detected synchronously to the processors execution and was not successfully corrected by hardware or PALcode. Examples of processor machine check conditions include processor B-cache buffer parity errors, memory uncorrectable errors, or read access to a nonexistent location.
Page 227
read-merge Indicates that an item is read from a responder/bystander, and new data is then added to the returned read data. This occurs when a masked write cycle is requested by the processor or when unmasked cycles occur and the CPU is configured to allocate on full block write misses.
Page 228
robust mode A power-up mode (baud rate select switch set to 0) that allows you to power up without initializing drivers or running power-up diagnostics. The console program has limited functionality in robust mode. ROM-based diagnostics Diagnostic programs resident in read-only memory. ROM-based diagnostics are the primary means of console mode testing and diagnosis of the CPU, memory, Ethernet, Futurebus+, SCSI, and DSSI subsystems.
Page 229
(pushed on) the stack, the stack pointer decrements. As items are retrieved from (popped off) the stack, the stack pointer increments. storage assembly All the components necessary to configure storage devices into a DEC 4000 AXP storage compartment. These components include the storage device, brackets, screws, shock absorbers, and cabling.
Page 230
One of two backplanes in the BA640 enclosure. CPU, memory, I/O, Futurebus+, and power modules plug into this backplane. See also backplane. system bus The private interconnect used on the DEC 4000 AXP CPU subsystem. This bus connects the B2001 processor module, the B2002 memory module, and the B2101 I/O module.
Page 231
TCP/IP Transmission Control Protocol/Internet Protocol. A set of software communications protocols widely used in UNIX operating environments. TCP delivers data over a connection between applications on different computers on a network; IP controls how packets (units of data) are transferred between computers on a network.
Page 232
volume shadowing The process of maintaining multiple copies of the same data on two or more disk volumes. When data is recorded on more than one disk volume, you have access to critical data even when one volume is unavailable. Also called disk mirroring. Vterm module The module located behind the OCP that provides the termination voltages for storage bus E.
Page 233
Index Acceptance testing, 3–34 cat el command, 2–17 ALLCLASS parameter, 6–37 cdp command, 6–34 ANALYZE/ERROR command, 4–6 clear_mop_counter command, 3–19 Auxiliary serial port, 6–44 Cold bootstrap, 2–34 Commands See also Console commands diagnostic, summarized, 3–21 BA640 enclosure diagnostic-related, 3–2 components, 6–1 firmware console, functions of, 1–9 front and rear, 6–5 to examine system configuration, 6–25...
Page 234
3–14 show_status, 3–7 DKUTIL local program, 3–23 test, 3–3 Documentation, 1–11 Console event log, 2–17 See also the DEC 4000 Information Console port baud rate, 6–41 Console port, testing, 3–20 Drive Console serial port, 6–42 error conditions, 3–22 CPU module, 6–7...
Page 235
ERF sample, 4–16 UERF sample, 4–18 Error log format, 4–5 Fan failure, 2–2 Error log translation Fans DEC OSF/1, 4–7 removal and replacement, 5–9, 5–17 OpenVMS AXP, 4–6 Fast SCSI 3.5-inch disk drive Error Log Utility removal and replacement, 5–4 relationship to UETP, 3–28, 3–32...
Page 236
See Programs, local Log files Hang, system, 3–34 See also UETP.LOG file Hardware, installing accounting, 1–10 console event, 1–10 See the DEC 4000 Quick Installation generated by UETP, 3–33 card OLDUETP.LOG, 3–33 HISTRY local program, 3–23 operator, 1–10 sethost, 1–10 Logs event, 1–8...
Page 237
fixed-media, described, 6–19 Operator control panel LEDs, 2–7 to 2–9 removable-media, described, 6–21 memexer command, 3–10 Options memexer_mp command, 3–11 See the DEC 4000 Options Guide Memory module Overtemperature, 2–2 displaying information for, 6–29 MS430, 6–11 removal and replacement, 5–16 Memory modules PARAMS local program, 3–23...
Page 238
6–33 performing extended testing and show error command, 3–8 exercising, 3–2 show fru command, 3–5 running, 3–1 show memory command, 6–29 utilities, 3–1 show_mop_counter command, 3–18 show_status command, 3–7 Site preparation See the DEC 4000 Site Preparation Checklist Index–6...
Page 239
4–4 exercising, 3–2 transaction types, 4–4 memory, 3–10, 3–11 System bus address cycle failures with DEC VET, 3–25 _CA_NOACK, 4–12 with DSSI device internal tests, 3–22 _CA_PAR, 4–12 with UETP, 3–26 reported by bus commander, 4–12 TIMA, 1–11...
Page 240
1–6 UETP$NODE_ADDRESS logical name, procedures, 1–2 3–31 UETP, 3–33 UETP.COM file, termination of, 3–32 with DEC VET, 1–10 UETP.LOG file, 3–33 with loopback tests, 1–9 UNITNUM parameter, 6–37 with operating system exercisers, 1–10 User disk, preparing for UETP, 3–27, with ROM-based diagnostics, 1–9...
Page 241
Reader’s Comments DEC 4000 AXP Service Guide EK–KN430–SV. B01 Your comments and suggestions help us improve the quality of our publications. Please rate the manual in the following categories: Excellent Good Fair Poor Accuracy (product works as described) Completeness (enough information)
Page 242
Do Not Tear – Fold Here and Tape NO POSTAGE NECESSARY IF MAILED IN THE UNITED STATES BUSINESS REPLY MAIL FIRST CLASS PERMIT NO. 33 MAYNARD MASS. POSTAGE WILL BE PAID BY ADDRESSEE DIGITAL EQUIPMENT CORPORATION INFORMATION DESIGN AND CONSULTING PKO3–1/D30 129 PARKER STREET MAYNARD, MA 01754–9975...
Need help?
Do you have a question about the 4000 AXP and is the answer not in the manual?
Questions and answers