Early Warning/Fault Tolerance; Overtemperature; Memory Errors - HP 9020 Service Manual

Hp 9000 series 500 model 520
Hide thumbs Also See for 9020:
Table of Contents

Advertisement

Testing and Troubleshooting
3-15
Early Warning/Fault Tolerance
The computer provides early warning of several probable failures. These warnings enable the user
to schedule maintenance at his convenience, reducing downtime due to unexpected failure. Early
warning of machine failure is provided for overtemperature conditions and memory bit errors.
A battery assembly on the keyboard processor board protects the contents of the real-time clock
and non-volatile memory (RTC/NVM). The battery assembly is fault tolerant in that four batteries
comprise the assembly but the circuit requires only three batteries to maintain RTC/NVM data.
Overtemperature
The computer contains three dc box fans. One fan is in the I/O card cage and operates at a single
speed whenever power is applied. The other two fans have three speeds and are associated with
the power supply and the processor stack.
The power supply and the processor stack CPU contain temperature sensors. The power supply
temperature sensor controls the two three-speed fans. When the temperature in the power supply
rises above 39°C, the power supply steps both fans from low to medium speed. When the tempera-
ture rises above 51°C, the fans are stepped from medium to high speed. When high speed is
required for proper cooling, a message is issued to the user, providing notice that shutdown is
imminent if temperature increase continues.
If the temperature at the power supply sensor exceeds 97°C or the processor stack CPU sensor
senses a temperature greater than 100°C, the power supply shuts down and one of the over-
temperature LEOs on the power supply lights (STACK TEMP or SEC BOARD). These LEOs are
visible by removing the front cover.
Memory Errors
The processor stack memory controller chip detects all single and double-bit RAM failures and
corrects single-bit failures. These detection and correction procedures are done at run time.
When a double-bit or greater failure is detected, the CPU is notified and the entire system halts. A
message is issued indicating which memory fin strate has failed.
When a single-bit failure is detected, the failure can be corrected and healed by pointing future
accesses of that location to a location in the healer RAM of the memory controller chip. Each
memory controller has 32 locations reserved for healing of RAM. When all 32 locations have been
used, the CPU is notified that the healer is full. The operating system then tests each of the healed
locations to determine if that location is still faulty or if a soft error caused the failure. Overflowed
healer CAMs can be cleared and reused by the operating system.

Advertisement

Table of Contents
loading

Table of Contents