Summary of Contents for Digital Equipment AlphaServer 1000A
Page 1
AlphaServer 1000A Service Guide Order Number: EK–ALPSV–SV. A01 Digital Equipment Corporation Maynard, Massachusetts...
Page 2
First Printing, March 1996 Digital Equipment Corporation makes no representations that the use of its products in the manner described in this publication will not infringe on existing or future patent rights, nor do the descriptions contained in this publication imply the granting of licenses to make, use, or sell equipment or software in accordance with the description.
Preface This guide describes the procedures and tests used to service AlphaServer 1000A systems. AlphaServer 1000A systems use a deskside ‘‘wide-tower’’ enclosure. Intended Audience This guide is intended for use by Digital Equipment Corporation service personnel and qualified self-maintenance customers.
In command descriptions, braces containing items separated by commas imply mutually exclusive items. Related Documentation • AlphaServer 1000A Owner’s Guide, EK-ALPSV-OG • DEC Verifier and Exerciser Tool User’s Guide, AA-PTTMD-TE • Guide to Kernel Debugging, AA-PS2TD-TE •...
Page 13
• DECevent Analysis and Notification Utility for OpenVMS Alpha, User and Reference Guide, AA-Q73LC-TE • DECevent Analysis and Notification Utility for Digital UNIX, User and Reference Guide AA-QAA4A-TE • StorageWorks RAID Array 200 Subsystems Controller Installation and Standalone Configuration Utility User’s Guide, EK-SWRA2-IG xiii...
Troubleshooting Strategy This chapter describes the troubleshooting strategy for AlphaServer 1000A systems. • Section 1.1 provides questions to consider before you begin troubleshooting an AlphaServer 1000A system. • Tables 1–1 through 1–5 provide a diagnostic flow for each category of system problem.
1.1.1 Problem Categories System problems can be classified into the following five categories. Using these categories, you can quickly determine a starting point for diagnosis and eliminate the unlikely sources of the problem. 1. Power problems (Table 1–1) 2. No access to console mode (Table 1–2) 3.
Table 1–1 Diagnostic Flow for Power Problems Symptom Action System does not power on. • Check the power source and power cord. • Check that the system’s top cover is properly secured. A safety interlock switch shuts off power to the system if the top cover is removed. •...
Table 1–2 Diagnostic Flow for Problems Getting to Console Mode Symptom Action Power-up screen is not displayed. Interpret the error beep codes at power-up (Section 2.1) for a failure detected during self-tests. Check that the keyboard and monitor are properly connected and turned on.
Table 1–3 Diagnostic Flow for Problems Reported by the Console Program Symptom Action Power-up tests do not complete. Interpret the error beep codes at power-up (Section 2.1) and check the power-up screen (Section 2.3) for a failure detected during self-tests. Console program reports error: Use the error beep codes (Section 2.1) and/or console terminal (Section 2.3) to determine the error.
Table 1–4 Diagnostic Flow for Boot Problems Symptom Action System cannot find boot device. Check the system configuration for the correct device parameters (node ID, device name, and so on). • For Digital UNIX and OpenVMS, use the commands show config show device (Section 5.1).
If the system is up, or you are able to bring it up, look at this information first. ROM-Based Diagnostics (RBDs) Many ROM-based diagnostics and exercisers are embedded in AlphaServer 1000A systems. ROM-based diagnostics execute automatically at power-up and can be invoked in console mode using console commands. Troubleshooting Strategy 1–7...
Page 22
RECOMMENDED USE: ROM-based diagnostics are the primary means of testing the console environment and diagnosing the CPU, memory, Ethernet, I/O buses, and SCSI and DSSI subsystems. Use ROM-based diagnostics in the acceptance test procedures when you install a system, add a memory module, or replace the following components: CPU module, memory module, motherboard, I/O bus device, or storage device.
(FRU) procedures and illustrations, is available in online format. You can download the hypertext file (A200A-S.HLP) or a self-extracting .HLP file from TIMA, or order the diskette (AK-QQRMA-CA) or the AlphaServer 1000A Maintenance Kit (QZ-OOUAB-GC). The maintenance kit includes hardcopy, diskette, and illustrated parts breakdown.
Page 24
Wide SCSI information and more. Supported Options Refer to the AlphaServer 1000A Supported Options List for a list of options supported under Digital UNIX, OpenVMS, and Windows NT. The options list is available from the Internet as follows: •...
Page 25
You can obtain information about hardware configurations for the AlphaServer 1000A from the Digital Systems and Options Catalog. The catalog is regularly published to assist in ordering and configuring systems and hardware options. Each printing of the catalog presents all of the products that are announced, actively marketed, and available for ordering.
Power-Up Diagnostics and Display This chapter provides information on how to interpret error beep codes and the power-up display on the console screen. In addition, a description of the power-up and firmware power-up diagnostics is provided as a resource to aid in troubleshooting.
2.1 Interpreting Error Beep Codes If errors are detected at power-up, audible beep codes are emitted from the system. For example, if the SROM code could not find any good memory, you would hear a 1-3-3 beep code (one beep, a pause, a burst of three beeps, a pause, and another burst of three beeps).
Page 29
SCSI Check that the J1 jumper on the controller (Qlogic 1020A). CPU daughter board is set at bank 1 for AlphaServer 1000A systems, as opposed to bank 0, reserved for AlphaServer 1000 systems (Figure 2–1).
PCI-to-PCI Check that the J1 jumper on the bridge (DECchip 21050). CPU daughter board is set at bank 1 for AlphaServer 1000A systems, as opposed to bank 0, reserved for AlphaServer 1000 systems (Figure 2–1).
Test duration: Approximately 10 seconds per 8 megabytes of memory. Figure 2–2 shows the bank and SIMM layout for AlphaServer 1000A systems. After determining the bad SIMM, refer to Chapter 6 for instructions on replacing FRUs. Note: The memory tests do not test the ECC SIMMs. If the operating system logs five or more single-bit correctible...
Page 32
Test duration: Approximately 2 seconds per 8 megabytes of memory. Figure 2–2 shows the bank and SIMM layout for AlphaServer 1000A systems. After determining the bad SIMM, refer to Chapter 6 for instructions on replacing FRUs. Note: The memory tests do not test the ECC SIMMs. If the operating system logs five or more single-bit correctible...
Page 33
Test duration: Approximately 2 seconds per 8 megabytes of memory. Figure 2–2 shows the bank and SIMM layout for AlphaServer 1000A systems. After determining the bad SIMM, refer to Chapter 6 for instructions on replacing FRUs. Note: The memory tests do not test the ECC SIMMs. If the operating system logs five or more single-bit correctible...
MA00926 Bank Jumper Setting Standard boot setting (AlphaServer 1000 systems) Standard boot setting (AlphaServer 1000A systems) Mini-console setting: Internal use only SROM CacheTest: backup cache test SROM BCacheTest: backup cache and memory test SROM memTest: memory test with backup and data cache disabled SROM memTestCacheOn: memory test with backup and data cache enabled Fail-Safe Loader setting: selects fail-safe loader firmware...
Use the arrow keys to select, then press Enter. 2.3.1 Console Event Log AlphaServer 1000A systems maintain a console event log consisting of status messages received during power-on self-tests. If problems occur during power-up, standard error messages indicated by asterisks (***) may be embedded in the console event log.
2.4 Mass Storage Problems Indicated at Power-Up Mass storage failures at power-up are usually indicated by read fail messages. Other problems are indicated by storage devices missing from the show config display. • Table 2–4 provides information for troubleshooting mass storage problems indicated at power-up or storage devices missing from the show config display.
Page 39
Table 2–4 (Cont.) Mass Storage Problems Problem Symptom Corrective Action Missing or loose Activity LEDs do not come Remove device and inspect cable cables. Drives not on. Drive missing from the connections. Reseat drive on properly seated on display. StorageWorks shelf. show config StorageWorks shelf SCSI bus length...
SCSI controller). Table 2–5 provides troubleshooting hints for AlphaServer 1000A systems that have the StorageWorks RAID Array 200 Subsystem. The RAID subsystem includes either the KZESC-xx (SWXCR-Ex) or the KZPSC-xx (SWXCR-Px) PCI backplane RAID controller.
Table 2–5 (Cont.) Troubleshooting RAID Problems Symptom Action Cannot access disks connected to On Windows NT systems, disks connected to the the RAID subsystem on Windows controller must be spun up before they can be NT systems. accessed. While running the ECU, verify that the controller is set to spin up two disks every six seconds.
2.6 EISA Bus Problems Indicated at Power-Up EISA bus failures at power-up are usually indicated by the following messages displayed during power-up: EISA Configuration Error. Run the EISA Configuration Utility. Run the EISA Configuration Utility (ECU) (Section 5.4) when this message is displayed.
• The CFG files supplied with the option you want to install may not work on AlphaServer 1000A systems. Some CFG files call overlay files that are not required on this system or may reference inappropriate system resources, for example, BIOS addresses. Contact the option vendor to obtain the proper CFG file.
Refer to the following documents for restrictions on specific PCI options: • AlphaServer 1000A READ THIS FIRST—shipped with the system. • AlphaServer 1000A Supported Options List—The options list is available from the Internet at the following locations: 2–20 Power-Up Diagnostics and Display...
ftp://ftp.digital.com/pub/DEC/Alpha/systems/ http://www.service.digital.com/alpha/server/ 2.8 Fail-Safe Loader The fail-safe loader (FSL) is a redundant or backup ROM that allows you to power up without running power-up diagnostics and load new SRM/ARC and FSL console firmware from the firmware diskette. Note The fail-safe loader should be used only when a failure at power-up prohibits you from getting to the console program.
1. Install the jumper at bank 7 of the J1 jumper on the CPU daughter board (Figure 2–6). The jumper is normally installed in the standard boot setting (bank 1 for AlphaServer 1000A systems). 2. Install the console firmware diskette and turn on the system.
MA00926 Bank Jumper Setting Standard boot setting (AlphaServer 1000 systems) Standard boot setting (AlphaServer 1000A systems) Mini-console setting: Internal use only SROM CacheTest: backup cache test SROM BCacheTest: backup cache and memory test SROM memTest: memory test with backup and data cache disabled SROM memTestCacheOn: memory test with backup and data cache enabled Fail-Safe Loader setting: selects fail-safe loader firmware...
Serial ROM diagnostics – Console firmware-based diagnostics Caution The AlphaServer 1000A enclosure will not power up if the top cover is not securely attached. Removing the top cover will cause the system to shut down. 2.9.1 AC Power-Up Sequence The following power-up sequence occurs when AC power is applied to the system (system is plugged in) or when electricity is restored after a power outage: 1.
2.9.2 DC Power-Up Sequence DC power is applied to the system with the DC On/Off button on the operator control panel. A summary of the DC power-up sequence follows: 1. When the DC On/Off button is pressed, the power supply checks for a POK_H condition.
3. Test the system bus to PCI bus bridge and system bus to EISA bus bridge. If the PCI bridge fails or EISA bridge fails, an audible error beep code (3-3-1) sounds (Table 2–1). The power-up tests continue despite these errors. 4.
Page 53
4. Run exercisers on the drives currently seen by the system. Note This step does not ensure that all disks in the system will be tested or that any device drivers will be completely tested. Spin-up time varies for different drives, so not all disks may be on line at this point in the power-up sequence.
AlphaServer 1000A RBDs rely on exerciser modules, rather than functional tests, to isolate errors. The exercisers are designed to run concurrently, providing a maximum bus interaction between the console drivers and the target devices.
3.2 Command Summary Table 3–1 provides a summary of the diagnostic and related commands. Table 3–1 Summary of Diagnostic and Related Commands Command Function Reference Acceptance Testing test Quickly tests the core system. The command Section 3.3.1 test is the primary diagnostic for acceptance testing and console environment diagnosis.
Table 3–1 (Cont.) Summary of Diagnostic and Related Commands Command Function Reference Loopback Testing test lb Conducts loopback tests for COM2 and the parallel Section 3.3.1 port in addition to quick core system tests. netew Runs external MOP loopback tests for specified Section 3.3.4 EISA- or PCI-based ew* (DECchip 21040, TULIP) Ethernet ports.
3.3.1 test command runs firmware diagnostics for the entire core system. The test tests are run concurrently in the background. Fatal errors are reported to the console terminal. command should be used in conjunction with the command to cat el test examine test/error information reported to the console event log.
Page 59
script tests devices in the following order: test 1. Console loopback tests if lb argument is specified: COM2 serial port and parallel port. 2. Network external loopback tests for E*A0. This test requires that the Ethernet port be terminated or connected to a live network; otherwise, the test will fail.
Page 60
Testing the memory Testing parallel port Testing the SCSI Disks Non-destructive Test of the Floppy started dka400.4.0.6.0 has no media present or is disabled via the RUN/STOP switch file open failed for dka400.4.0.6.0 Testing the VGA(Alphanumeric Mode only) Printer offline file open failed for para >>>...
3.3.2 cat el and more el commands display the current contents of the console cat el more el event log. Status and error messages (if problems occur) are logged to the console event log at power-up, during normal system operation, and while running system tests.
3.3.3 memory command tests memory by running a memory exerciser each time the memory command is entered. The exercisers are run in the background and nothing is displayed unless an error occurs. The number of exercisers, as well as the length of time for testing, depends on the context of the testing.
Page 63
The following is an example with a memory compare error indicating bad SIMMs. >>> memory >>> memory >>> memory *** Hard Error - Error #44 - Memory compare error Diagnostic Name Device Pass Test Hard/Soft 1-JAN-2066 memtest 000000c8 brd0 12:00:01 Expected value: 00000004 Received value: 80000001 Failing addr:...
3.3.4 netew command is used to run MOP loopback tests for any EISA- or PCI- netew based ew* (DECchip 21040, TULIP) Ethernet ports. The command can also be used to test a port on a ‘‘live’’ network. The loopback tests are set to run continuously (-p pass_count set to 0). Use the command (or ) to terminate an individual diagnostic or the kill...
Page 65
Testing an Ethernet Port: >>> netew >>> show_status Program Device Pass Hard/Soft Bytes Written Bytes Read -------- ------------ ------------ ------ --------- ------------- ------------- 00000001 idle system 000000d5 nettest ewa0.0.0.0.0 308672 308672 >>> kill_diags >>> Running System Diagnostics 3–11...
3.3.5 network command is used to run MOP loopback tests for any EISA- or PCI- network based er* (DEC 4220, LANCE) Ethernet ports. The command can also be used to test a port on a ‘‘live’’ network. The loopback tests are set to run continuously (-p pass_count set to 0). Use the command (or ) to terminate an individual diagnostic or the kill...
Page 67
Testing an Ethernet Port: >>> network >>> show_status Program Device Pass Hard/Soft Bytes Written Bytes Read -------- ------------ ------------ ------ --------- ------------- ------------- 00000001 idle system 000000d5 nettest era0.0.0.0.0 308672 308672 >>> kill_diags >>> Running System Diagnostics 3–13...
3.3.8 kill and kill_diags commands terminate diagnostics that are currently kill kill_diags executing . Note A serial loopback connector (12-27351-01) must be installed on the COM2 serial port for the command to successfully terminate system kill_diags tests. • command terminates a specified process. kill •...
3.3.9 show_status command reports one line of information per executing show_status diagnostic. The information includes ID, diagnostic program, device under test, error counts, passes completed, bytes written, and bytes read. Many of the diagnostics run in the background and provide information only if an error occurs.
3.4 Acceptance Testing and Initialization Perform the acceptance testing procedure listed below after installing a system or whenever adding or replacing the following: Memory modules Motherboard CPU daughter board Storage devices EISA or PCI options 1. Run the RBD acceptance tests using the command.
DECevent Translation and Reporting Utility available with OpenVMS and Digital UNIX. 4.1 Fault Detection and Reporting Table 4–1 provides a summary of the fault detection and correction components of AlphaServer 1000A systems. Generally, PALcode handles exceptions as follows: • The PALcode determines the cause of the exception.
Table 4–1 AlphaServer 1000 Fault Detection and Correction Component Fault Detection/Correction Capability KN22A Processor Module DECchip 21064 and 21064A Contains error detection and correction (EDC) logic for microprocessors data cycles. There are check bits associated with all data entering and exiting the 21064(A) microprocessor. A single- bit error on any of the four longwords being read can be corrected (per cycle).
Page 75
System Machine Check (SCB: 660) A system machine check is a system-detected error, external to the DECchip 21064 microprocessor and possibly not related to the activities of the CPU. These errors are specific to AlphaServer 1000A systems. Fatal errors: •...
Single-bit Dstream ECC error • System transaction terminated with CACK_SERR System Machine Check (SCB: 620) These errors (non-fatal) are AlphaServer 1000A-specific correctable errors. These errors result in the generation of the correctable machine check logout frame: • Correctable read errors •...
4.3 Event Record Translation Systems running Digital UNIX and OpenVMS operating systems use the DECevent management utility to translate events into ASCII reports derived from system event entries (bit-to-text translations). The DECevent utility has the following features relating to the translation of events: •...
System faults can be isolated by examining translated system error logs or using the DECevent Analysis and Notification Utility. Refer to the DECevent Analysis and Notification Utility for OpenVMS Alpha, User and Reference Guide, AA-Q73LC-TE, for more information. 4.3.2 Digital UNIX Translation Using DECevent The kernel error log entries are translated from binary to ASCII using the command.
System Configuration and Setup This chapter provides configuration and setup information for AlphaServer 1000A systems and system options. • Section 5.1 describes how to examine the system configuration using the console firmware. – Section 5.1.1 describes the function of the two firmware interfaces used with AlphaServer 1000A systems.
5.1 Verifying System Configuration Figure 5–1 illustrates the system architecture for AlphaServer 1000A systems. Figure 5–1 System Architecture: AlphaServer 1000A Secondary PCI Bus PCI-PCI OLOGIC Fast-Wide Bridge ISP1020A SCSI Bus CPU Card SROM 21064 PCI Slots PCI Slots Bcache PCI Slots...
Page 81
SRM Command Line Interface Systems running Digital UNIX or OpenVMS access the SRM firmware through a command line interface (CLI). The CLI is a UNIX style shell that provides a set of commands and operators, as well as a scripting facility. The CLI allows you to configure and test the system, examine and alter system state, and boot the operating system.
5.1.2 Switching Between Interfaces For a few procedures it is necessary to switch from one console interface to the other. • command is run from the SRM interface. test • The EISA Configuration Utility (ECU) and the RAID Configuration Utility (RCU) are run from the ARC interface.
5.1.3.1 Display Hardware Configuration The hardware configuration display provides the following information: • The first screen displays system information, such as the memory, CPU type, speed, NVRAM usage, the ARC version time stamp, and the type of video option detected. •...
Table 5–2 ARC Firmware Device Names Name Description multi(0)key(0)keyboard(0) The multi( ) devices are located on the system module. multi(0)serial(0) These devices include the keyboard port and the serial multi(0)serial(1) line ports. eisa(0)video(0)monitor(0) The eisa( ) devices are provided by devices on the EISA eisa(0)disk(0)fdisk(0) bus.
Example 5–1 (Cont.) Sample Hardware Configuration Display eisa(0)video(0)monitor(0) multi(0)key(0)keyboard(0) eisa(0)disk(0)fdisk(0) (Removable) multi(0)serial(0) multi(0)serial(1) scsi(0)disk(0)rdisk(0) (4 Partitions) DEC RZ29B (C)DEC007 scsi(0)cdrom(0)fdisk(0) (Removable) RRD43 (C) DEC 1084 Press any key to continue... 12/20/1995 9:06:23 AM Wednesday PCI slot information: Bus Device Function Vendor Device Revision Interrupt Device Number Number Number Vector...
Table 5–3 lists and explains the default ARC firmware environment variables. Table 5–3 ARC Firmware Environment Variables Variable Description The default floppy drive. The default value is eisa( )disk( )fdisk( ). AUTOLOAD The default startup action, either YES (boot) or NO or undefined (remain in Windows NT firmware).
5.1.4 Verifying Configuration: SRM Console Commands for Digital UNIX and OpenVMS The following SRM console commands are used to verify system configuration on Digital UNIX and OpenVMS systems: • (Section 5.1.4.1)—Displays the buses on the system and the show config devices found on those buses.
Page 88
Bus 0, Slots 11–13 correspond to physical PCI card cage slots on the primary PCI bus: Slot 11 = PCI11 Slot 12 = PCI12 Slot 13 = PCI13 In the case of storage controllers, the devices off the controller are also displayed.
Page 89
Synopsis: show config Example: >>> show config Firmware SRM Console: X4.4-5365 ARC Console: 4.43p PALcode: VMS PALcode X5.48-115, OSF PALcode X1.35-84 Serial Rom: X2.1 Processor DECchip (tm) 21064A-6 MEMORY 32 Meg of System Memory Bank 0 = 32 Mbytes() Starting at 0x00000000 PCI Bus Bus 00 Slot 07: Intel 8275EB PCI to Eisa Bridge...
Page 90
The following example illustrates how PCI options that contain a show config PCI-to-PCI bridge are represented in the display. For each option that contains a PCI-to-PCI bridge, the bus number increments by 1, and the logical slot numbers start anew at 0. The sample system configuration contains the following options: •...
Page 91
Example: >>> show config Firmware SRM Console: X4.4-5365 ARC Console: 4.43p PALcode: VMS PALcode X5.48-115, OSF PALcode X1.35-84 Serial Rom: X2.1 Processor DECchip (tm) 21064A-6 MEMORY 32 Meg of System Memory Bank 0 = 32 Mbytes() Starting at 0x00000000 PCI Bus Bus 00 Slot 07: Intel 8275EB PCI to Eisa Bridge Bus 00 Slot 08: Digital PCI to PCI Bridge Chip...
5.1.4.2 show device command displays the devices and controllers in the system. show device The device name convention is shown in Figure 5–2. Figure 5–2 Device Name Convention dka0.0.0.0.0 Hose Number: 0 PCI_0 (32-bit PCI); 1 EISA Logical Slot Number: For EISA options---Correspond to EISA option physical slot numbers (1 and 2) For PCI options: Slot 7 = PCI to EISA bridge chip Slot 8 = PCI to PCI bridge chip...
Example: >>> show device dka400.4.0.6.0 DKA400 RRD43 2893 dva0.0.0.0.1 DVA0 era0.0.0.2.1 ERA0 08-00-2B-BC-93-7A pka0.7.0.6.0 PKA0 SCSI Bus ID 7 >>> Console device name Node name (alphanumeric, up to 6 characters) Device type Firmware version (if known) 5.1.4.3 show memory command displays information for each bank of memory in the show memory system.
show envar Arguments: envar The name of the environment variable to be modified. value The value that is assigned to the environment variable. This may be an ASCII string. Options: -default Restores variable to its default value. -integer Creates variable as an integer. -string Creates variable as a string (default).
Page 95
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function bootdef_dev The device or device list from which booting is to be attempted, when no path is specified on the command line. Set at factory to disk with Factory Installed Software;...
Page 96
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function bus_probe_ Specifies a bus probe algorithm for the system. algorithm OLD—Systems running OpenVMS V6.1 or earlier must set the bus probe algorithm to old—Failure to do so could result in bugcheck errors when booting from an EISA device.
Page 97
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function er*0_protocols, Determines which network protocols are enabled for ew*0_protocols booting and other functions. ‘‘mop’’—Sets the network protocol to MOP: the setting typically used for systems using the OpenVMS operating system. ‘‘bootp’’—Sets the network protocol to bootp: the setting typically used for systems using the Digital UNIX operating system.
Page 98
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function pci_parity Disable or enable parity checking on the PCI bus. ON—PCI parity enabled. OFF—PCI parity disabled. Some PCI devices do not implement PCI parity checking, and some have a parity-generating scheme in which the parity is sometimes incorrect or is not fully compliant with the PCI specification.
Page 99
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function pk*0_host_id Sets the controller host bus node ID to a value between 0 and 7. 0 to 7—Assigns bus node ID for specified host adapter. pk*0_soft_term Enables or disables SCSI terminators. This environ- ment variable applies to systems using the QLogic ISP1020 SCSI controller.
Page 100
Table 5–4 (Cont.) Environment Variables Set During System Configuration Variable Attributes Function tga_sync_green Sets the location of the SYNC signal generated by the ZLXp-E PCI graphics accelerator (PBXGA). This environment variable must be set correctly so that the graphics monitor will synchronize. The parameter is a bit mask, where the least significant bit (LSB) sets the vertical SYNC for the first graphics card found, the second for the second found, and so on.
the system by entering the command or pressing the Reset button. init 5.2 System Bus Options The system bus interconnects the CPU and memory modules. Figure 5–3 shows the card cage and bus locations. Figure 5–3 Card Cages and Bus Locations Power Diskette Drive Connectors...
SROM code (SROM tests are controlled by jumper J6 on the CPU daughter board) 5.2.2 Memory Modules AlphaServer 1000A systems can support from 16 megabytes to 1024 megabytes of memory. Memory options consist of five single in-line memory modules (SIMMs) and are available in the following variations: •...
Table 5–5 provides the memory requirements and recommendations for each operating system. Table 5–5 Operating System Memory Requirements Operating System Memory Requirements Digital UNIX and 32 MB minimum; 64 MB recommended OpenVMS Windows NT 16 MB minimum; 32 MB recommended Windows NT Server 32 MB minimum;...
• Two serial ports with full modem control and the parallel port • The keyboard and mouse interface • CIRRUS VGA controller • The speaker interface • PCI-to-PCI bridge chip set (PPB) • PCI-to-EISA bridge chip set • EISA system component chip •...
The EISA bus is a superset of the well-established ISA bus and has been designed to be backward compatible with 16-bit and 8-bit architecture. Therefore, ISA modules can be used in AlphaServer 1000A systems, provided the operating system supports the device.
Before you install an option, check that the system supports the option. The version of the ECU that ships with AlphaServer 1000A systems accommodates 8 EISA slots. However, AlphaServer 1000A systems have only two EISA slots, slots 1 and 2.
• If you are configuring an EISA bus that contains both ISA and EISA options, refer to Table 5–7. 4. Locate the correct ECU diskette for your operating system. The ECU diskette is shipped in the accessories box with the system. Make a copy of the appropriate diskette, and keep the original in a safe place.
Page 108
The system displays ‘‘loading ARC firmware.’’ When the firmware has finished loading, the ECU program is booted. 3. Complete the ECU procedure according to the guidelines provided in the following sections. • If you are configuring an EISA bus that contains only EISA options, refer to Table 5–6.
5.6.3 Configuring EISA Options EISA boards are recognized and configured automatically. Study Table 5–6 for a summary of steps to configure an EISA bus that contains no ISA options. Review Section 5.6.1. Then run the ECU as described in Section 5.6.2. Note It is not necessary to run Step 2 of the ECU, ‘‘Add or remove boards.’’...
5.6.4 Configuring ISA Options ISA boards are configured manually, whereas EISA boards are configured through the ECU software. Study Table 5–7 for a summary of steps to configure an EISA bus that contains both EISA and ISA options. Review Section 5.6.1. Then run the ECU as described in Section 5.6.2.
PCI (Peripheral Component Interconnect) is an industry-standard expansion I/O bus that is the preferred bus for high-performance I/O options. The AlphaServer 1000A provides seven slots for 32-bit PCI options. A PCI board is shown in Figure 5–6. Figure 5–6 PCI Board MA00080 Install PCI boards according to the instructions supplied with the option.
5.7.1 PCI-to-PCI Bridge AlphaServer 1000A systems have a PCI-to-PCI bridge (DECchip 21050) on the motherboard. • Physical PCI slots 11, 12, and 13 (primary PCI) are located before the bridge. • Physical PCI slots 1, 2, 3 and 4 (secondary PCI) are located behind the bridge.
When configuring the StorageWorks shelf, note the following: • Narrow SCSI (8-bit) devices can be used in the wide StorageWorks shelf, as long as the devices are at a supported revision level. The narrow devices will run in narrow mode. •...
5.8.3 SCSI Bus Configurations Table 5–8 provides descriptions of the SCSI configurations available using single, dual, and triple controllers, as well as single and split StorageWorks backplanes. Table 5–8 SCSI Storage Configurations SCSI Buses Configuration Single The native Fast-SCSI-2 controller on the backplane provides 8-bit SCSI support for the removable-media bus;...
Figure 5–7 Single Controller Configuration Bus ID 4 Bus ID 5 Bus A 12-45490-01 17-04233-01 17-04021-01 12-41667-05 External Terminator StorageWorks StorageWorks Backplane Shelf (Rear) (Front) 17-04022-01 MA00900 System Configuration and Setup 5–37...
Figure 5–8 Dual Controller Configuration with Split StorageWorks Backplane Bus ID 4 Bus ID 5 Bus A Bus B Controller Option Card 17-04233-01 12-45490-01 17-04022-01 12-41667-05 17-04019-01 12-41667-04 StorageWorks StorageWorks Backplane Shelf External (Rear) (Front) Terminators 17-04022-02 MA00950 5–38 System Configuration and Setup...
Figure 5–9 Triple Controller Configuration with Split StorageWorks Backplane Bus ID 4 Bus ID 5 Bus C Bus A Bus B Controller Option Cards 17-04233-01 17-04022-01 12-41667-05 17-04019-01 12-41667-04 StorageWorks StorageWorks Backplane Shelf (Rear) (Front) 17-04022-01 17-04019-01 12-41667-04 External Terminators MA00902 System Configuration and Setup 5–39...
AlphaServer 1000A systems offer added reliability with redundant power options, as well as UPS options. The power supplies for AlphaServer 1000A systems support two different modes of operation. In addition, UPS options are available. Refer to Figure 5–10. Power supply modes of operation: 1.
Figure 5–10 Power Supply Configurations Redundant Single 400 Watts DC or Less 400 Watts DC or Less MA00335 The H7290-AA power supply kit is used to order a second power supply and current sharing cable. System Configuration and Setup 5–41...
5.10 Console Port Configurations Power-up information is typically displayed on the system’s console terminal. The console terminal may be either a graphics monitor or a serial terminal (connected through the COM1 serial port). There are several SRM console environment variables related with configuring the console ports: Environment Variable Description...
serial Sets the power-up output to be displayed on the device that is connected to the COM1 port at the rear of the system. Example: P00>>> set console serial P00>>> init . !Now switch to the serial terminal. P00>>> show console console serial 5.10.2 set tt_allow_login...
5.10.3 set tga_sync_green The tga_sync_green environment variable sets the location of the SYNC signal generated by the ZLXp-E PCI graphics accelerator card. The correct setting, displayed with the command, is: show >>> show tga_sync_green tga_sync_green If the monitor does not synchronize, set the parameter as follows: >>>...
5.10.5 Using a VGA Controller Other than the Standard On-Board When the system is configured to use a PCI- or EISA-based VGA controller instead of the standard on-board VGA (CIRRUS), consider the following: • The on-board CIRRUS VGA options must be set to disabled through the ECU. •...
AlphaServer 1000A FRU Removal and Replacement This chapter describes the field-replaceable unit (FRU) removal and replacement procedures for AlphaServer 1000A systems, which use a deskside ‘‘wide-tower’’ enclosure. • Section 6.1 lists the FRUs for AlphaServer 1000A-series systems. • Section 6.2 provides the removal and replacement procedures for the FRUs.
Figure 6–2 FRUs, Rear Left Memory Upper Fan SCSI Cables Speaker Lower Fan Power Cord SCSI Removable Media Cable CPU Daughter Board Motherboard NVRAM Chip (E14) NVRAM Toy Clock Chip (E78) MA00930 6–6 AlphaServer 1000A FRU Removal and Replacement...
(29-26246) and grounded work surface when working with internal parts of a computer system. Unless otherwise specified, you can install an FRU by reversing the steps shown in the removal procedure. Figure 6–3 Opening Front Door MA00909 AlphaServer 1000A FRU Removal and Replacement 6–7...
6.2.1 Cables This section shows the routing for each cable in the system. Figure 6–5 Floppy Drive Cable (34-Pin) 17-03970-02 MA01420 AlphaServer 1000A FRU Removal and Replacement 6–9...
The power supply DC cable assembly contains the following cables: • Power supply signal/misc cable (15-pin) • Power supply +5V cable (24-pin) • Power supply +3.3V (20-pin) Figure 6–9 Removing Cable Channel Guide Cable Channel Guide MA01433 6–12 AlphaServer 1000A FRU Removal and Replacement...
Figure 6–14 shows the 17-04022-01 SCSI cable used from the native wide SCSI controller to the J17 connector of the StorageWorks backplane, and the 17-04022-02 SCSI cable used from the option controller to the AlphaServer 1000A FRU Removal and Replacement 6–17...
J11 connector of the StorageWorks backplane. In Figure 6–15, just the 17-04022-02 variant is used in a single bus configuration. Figure 6–15 Wide-SCSI (Controller to StorageWorks Shelf) Cable (68-Pin) 12-45490-01 17-04233-01 17-04019-01 12-41667-05 External Terminator StorageWorks Backplane (Rear) 17-04022-02 MA01429 6–18 AlphaServer 1000A FRU Removal and Replacement...
Figure 6–18 Removing CPU Daughter Board Crossbar Retaining Screw CPU Card Handle Clips MA00312 Warning: CPU and memory modules have parts that operate at high temperatures. Wait 2 minutes after power is removed before handling these modules. AlphaServer 1000A FRU Removal and Replacement 6–21...
BLOCKING ACCESS TO THE FAN SCREWS. See Figure 6–18 for removing the CPU daughter board. STEP 2: DISCONNECT THE FAN CABLE FROM THE MOTHERBOARD AND REMOVE FAN. Figure 6–19 Removing Fans Upper Fan Lower Fan MA00311 6–22 AlphaServer 1000A FRU Removal and Replacement...
However, you will not need to power down the server before installing the drives. Figure 6–20 Removing StorageWorks Drive MA00322 AlphaServer 1000A FRU Removal and Replacement 6–23...
SIMM 0 SIMM 2 ECC SIMM for Bank 2 ECC SIMM for Bank 3 ECC Banks ECC SIMM for Bank 0 ECC SIMM for Bank 1 MA00327 STEP 3: REPLACE THE FAILING SIMMS. 6–26 AlphaServer 1000A FRU Removal and Replacement...
SIMM 1 SIMM 3 Bank 0 SIMM 0 SIMM 2 ECC SIMM for Bank 2 ECC SIMM for Bank 3 ECC Banks ECC SIMM for Bank 0 ECC SIMM for Bank 1 MA00315 AlphaServer 1000A FRU Removal and Replacement 6–27...
SIMM 1 SIMM 3 Bank 0 SIMM 0 SIMM 2 ECC SIMM for Bank 2 ECC SIMM for Bank 3 ECC Banks ECC SIMM for Bank 0 ECC SIMM for Bank 1 MA00316 6–28 AlphaServer 1000A FRU Removal and Replacement...
Page 153
Note When installing SIMMs, make sure that the SIMMs are fully seated. The two latches on each SIMM connector should lock around the edges of the SIMMs. AlphaServer 1000A FRU Removal and Replacement 6–29...
6.2.9 Motherboard STEP 1: RECORD THE POSITION OF EISA AND PCI OPTIONS. STEP 2: REMOVE EISA AND PCI OPTIONS. STEP 3: REMOVE CPU DAUGHTER BOARD. Figure 6–27 Removing EISA and PCI Options MA00936 AlphaServer 1000A FRU Removal and Replacement 6–31...
Wait 2 minutes after power is removed before handling these modules. STEP 4: DETACH MOTHERBOARD CABLES, REMOVE SCREWS AND MOTHERBOARD. Caution When replacing the system bus motherboard install the screws in the order indicated. 6–32 AlphaServer 1000A FRU Removal and Replacement...
Page 158
Move the socketed NVRAM chip (position E14) and NVRAM TOY chip (E78) to the replacement motherboard and set the jumpers to match previous settings. Note The NVRAM TOY chip contains the os_type environment variable. This environment variable may need to be reset (Section 5.1.4.4). 6–34 AlphaServer 1000A FRU Removal and Replacement...
The default setting is for the SRM console for OpenVMS or Digital UNIX operating systems. 6.2.11 OCP Module STEP 1: REMOVE FRONT DOOR. STEP 2: REMOVE FRONT PANEL. STEP 3: REMOVE OCP MODULE. Figure 6–31 Removing Front Door MA01426 6–36 AlphaServer 1000A FRU Removal and Replacement...
Default Jumper Settings This appendix provides the location and default setting for all jumpers in AlphaServer 1000A systems: • Section A.1 provides location and default settings for jumpers located on the motherboard. • Section A.2 provides the location and supported settings for jumpers J3 and J4 on the CPU daughter board.
A.1 Motherboard Jumpers Figure A–1 shows the location and default settings for jumpers located on the motherboard. Figure A–1 Motherboard Jumpers (Default Settings) Large Fan (J16) VGA Enable (J27) Small Fan (J55) Force Shutdown (J53) Flash ROM VPP Enable (J50) Temperature Shutdown (J52) Fan Fault (J56)
Page 171
Large Fan Allows the large fan to be This jumper is not disabled to accommodate the installed on AlphaServer alternative enclosures. 1000A systems. Remote Console When enabled, activates the Disabled (as shown in Module (RCM) RCM DC enable connector (J17) Figure A–1).
A.2 CPU Daughter Board (J3 and J4) Supported Settings Figure A–2 shows the supported AlphaServer 1000A 4/266 settings for the J3 and J4 jumpers on the CPU daughter board. These jumpers affect clock speed and other critical system settings. Figure A–3 shows the supported AlphaServer 1000A 4/233 settings for the J3 and J4 jumpers on the CPU daughter board.
MA00926 Bank Jumper Setting Standard boot setting (AlphaServer 1000 systems) Standard boot setting (AlphaServer 1000A systems) Mini-console setting: Internal use only SROM CacheTest: backup cache test SROM BCacheTest: backup cache and memory test SROM memTest: memory test with backup and data cache disabled SROM memTestCacheOn: memory test with backup and data cache enabled Fail-Safe Loader setting: selects fail-safe loader firmware...
Glossary 10Base-T Ethernet network IEEE standard 802.3-compliant Ethernet products used for local distribution of data. These networking products characteristically use twisted-pair cable. User interface to the console firmware for operating systems that require firmware compliance with the Windows NT Portable Boot Loader Specification. ARC stands for Advanced RISC Computing.
Page 176
backup cache A second, very fast cache memory that is closely coupled with the processor. bandwidth The rate of data transfer in a bus or I/O channel. The rate is expressed as the amount of data that can be transferred in a given time, for example megabytes per second.
Page 177
bystander A system bus node (CPU or memory) that is not addressed by a current system bus commander. byte A group of eight contiguous bits starting on an addressable byte boundary. The bits are numbered right to left, 0 through 7. cache memory A small, high-speed memory placed between slower main memory and the processor.
Page 178
cluster A group of networked computers that communicate over a common interface. The systems in the cluster share resources, and software programs work in close cooperation. cold bootstrap A bootstrap operation following a power-up or system initialization (restart). On Alpha based systems, the console loads PALcode, sizes memory, and initializes environment variables.
Page 179
data cache A high-speed cache memory reserved for the storage of data. Abbreviated as D-cache. DEC VET Digital DEC Verifier and Exerciser Tool. A multipurpose system diagnostic tool that performs exerciser-oriented maintenance testing. diagnostic program A program that is used to find and correct problems with a computer system. Digital UNIX A general-purpose operating system based on the Open Software Foundation technology.
Page 180
Error correction code. Code and algorithms used by logic to facilitate error detection and correction. EEPROM Electrically erasable programmable read-only memory. A memory device that can be byte-erased, written to, and read from. EISA bus Extended Industry Standard Architecture bus. A 32-bit industry-standard I/O bus used primarily in high-end PCs and servers.
Page 181
Flexible interconnect bridge. A converter that allows the expansion of the system enclosure to other DSSI devices and systems. field-replaceable unit Any system component that a qualified service person is able to replace on site. firmware Software code stored in hardware. fixed-media compartments Compartments that house nonremovable storage media.
Page 182
instruction cache A high-speed cache memory reserved for the storage of instructions. Abbreviated as I-cache. interrupt request lines (IRQs) Bus signals that connect an EISA or ISA module (for example, a disk controller) to the system so that the module can get the system’s attention through an interrupt.
Page 183
Medium attachment unit. On an Ethernet LAN, a device that converts the encoded data signals from various cabling media (for example, fiber optic, coaxial, or ThinWire) to permit connection to a networking station. memory interleaving The process of assigning consecutive physical memory addresses across multiple memory controllers.
Page 184
NVRAM Nonvolatile random-access memory. Memory that retains its information in the absence of power. Operator control panel. open system A system that implements sufficient open specifications for interfaces, services, and supporting formats to enable applications software to: • Be ported across a wide range of systems with minimal changes •...
Page 185
portability The degree to which a software application can be easily moved from one computing environment to another. porting Adapting a given body of code so that it will provide equivalent functions in a computing environment that differs from the original implementation environment.
Page 186
reliability The probability a device or system will not fail to perform its intended functions during a specified time. responder In any particular bus transaction, memory, CPU, or I/O that accepts or supplies data in response to a command/address from the system bus commander. RISC Reduced instruction set computer.
Page 187
User interface to console firmware for operating systems that expect firmware compliance with the Alpha System Reference Manual (SRM). storage array A group of mass storage devices, frequently configured as one logical disk. StorageWorks Digital’s modular storage subsystem (MSS), which is the core technology of the Alpha SCSI-2 mass storage solution.
Page 188
test-directed diagnostics (TDDs) An approach to diagnosing computer system problems whereby error data logged by diagnostic programs resident in read-only memory (RBDs) is analyzed to capture information about the problem. thickwire One-half inch, 50-Ohm coaxial cable that interconnects the components in many IEEE standard 802.3-compliant Ethernet networks.
Page 189
write back A cache management technique in which data from a write operation to cache is written into main memory only when the data in cache must be overwritten. write-enabled Indicates a device onto which data can be written. write-protected Indicates a device onto which data cannot be written.
Page 191
Index Configuration (cont’d) EISA boards, 5–31 ISA boards, 5–32 A: environment variable, 5–7 of environment variables, 5–15 AC power-up sequence, 2–24 power supply, 5–40, 5–41 Acceptance testing, 3–18 verifying, OpenVMS and Digital UNIX, arc command, 5–4 5–9 ARC interface, 5–3 verifying, Windows NT, 5–4 switching to SRM from, 5–4 Console...
Page 192
Diagnostics (cont’d) Console event log, 2–11 power-up, 2–1 Console firmware power-up display, 2–1 diagnostics, 2–26 related commands, 3–3 Digital UNIX, 5–3 related commands, summarized, 3–2 OpenVMS, 5–3 ROM-based, 1–7, 3–1 Windows NT, 5–3 serial ROM, 2–25 Console interfaces showing status of, 3–17 switching between, 5–4 Digital Assisted Services (DAS), 1–11 Console output, 5–44...
Page 193
Environment variables set during system FLOPPY2 environment variable, 5–8 configuration, 5–16 Formats Error Windows NT firmware device names, handling, 1–7 5–5 logging, 1–7 FRUs, 6–2 Error formatters FWSEARCHPATH environment variable, DECevent, 4–5 5–7 Error log translation Digital UNIX, 4–6 OpenVMS, 4–5 Hot swap, 6–23 Error logging, 4–4 event log entry format, 4–4...
Page 195
System architecture, 5–2 options, 5–23 RAID power-up display, interpreting, 2–1 diagnostic flow, 2–14 troubleshooting categories, 1–2 RAID problems, 2–14 System bus Removable media location, 5–23 storage problems, 2–12 System machine check, 4–3 ROM-based diagnostics (RBDs), 1–7 System module devices diagnostic-related commands, 3–3 Windows NT firmware device names, performing extended testing and 5–5...
Page 196
Troubleshooting (cont’d) SIMMs, 2–5 EISA problems, 2–18 with DEC VET, 1–8 error report formatter, 1–7 with loopback tests, 1–8 errors reported by operating system, with operating system exercisers, 1–8 1–7 with ROM-based diagnostics, 1–7 interpreting error beep codes, 2–2 mass storage problems, 2–12 PCI problems, 2–20 power problems, 1–3 Windows NT firmware...
Electronic Store, call 800-DIGITAL (800-344-4825) and ask for an Electronic Store specialist. Telephone and Direct Mail Orders From Call Write U.S.A. DECdirect Digital Equipment Corporation Phone: 800-DIGITAL P.O. Box CS2008 (800-344-4825) Nashua, NH 03061 Fax: (603) 884-5597 Puerto Rico Phone: (809) 781-0505 Digital Equipment Caribbean, Inc.
Page 199
Reader’s Comments AlphaServer 1000A Service Guide EK–ALPSV–SV. A01 Your comments and suggestions help us improve the quality of our publications. Thank you for your assistance. I rate this manual’s: Excellent Good Fair Poor Accuracy (product works as manual says) Completeness (enough information)
Page 200
If Mailed in the United States BUSINESS REPLY MAIL FIRST CLASS PERMIT NO. 33 MAYNARD MASS. POSTAGE WILL BE PAID BY ADDRESSEE DIGITAL EQUIPMENT CORPORATION Shared Engineering Services PKO3-2/T32 PARKER STREET MAYNARD, MA 01754-2199 Do Not Tear – Fold Here...