Early Warning/Fault Tolerance; Overtemperature; Memory Errors - HP 9030 Service Manual

Hp 9000 series 500 computers service manual
Hide thumbs Also See for 9030:
Table of Contents

Advertisement

3-18 Testing and Troubleshooting
Early Warning/Fault Tolerance
The computer provides early warning of several probable failures. These warnings enable the user
to schedule maintenance at his convenience, reducing downtime due to unexpected failure. Early
warning of machine failure is provided for overtemperature conditions and memory errors.
A battery assembly on the system control module maintains the contents of the real-time clock
(RTC) and non-volatile memory (NVM) when power is removed. The battery assembly is fault
tolerant in that four batteries comprise the assembly but the circuit requires only three batteries to
maintain RTC/NVM data.
Overtemperature
The computer contains three dc box fans. One fan is in the
110
card cage and operates at a single
speed whenever power is applied. The other two fans have three speeds and are associated with
the power supply and the processor stack.
The power supply and the processor stack CPU contain temperature sensors. The power supply
temperature sensor controls the two three-speed fans. When the temperature in the power supply
rises above 39°C, the power supply steps both fans from low to medium speed. When the tempera-
ture rises above 51°C, the fans are stepped from medium to high speed. When high speed is
required for proper cooling, a message is issued to the user, providing notice that shutdown is
imminent if temperature increase continues.
When the temperature at the power supply sensor exceeds 97°C or the processor stack CPU
sensor senses a temperature greater than 100°C, the power supply shuts down. The
TEMP
indicator on the service panel turns on if the stack temperature is too high; the SEC BOARD
LED on the power supply assembly lights if a power supply overtemperature condition exists.
Memory Errors
The processor stack memory controller chip detects all single and double-bit RAM failures and
corrects single-bit failures. These detection and correction procedures are done at run time.
When a double-bit or greater RAM failure is detected, the CPU is notified and the entire system
halts. A message is issued indicating which memory finstrate has failed.
When a Single-bit failure is detected, the failure can be corrected and healed by pointing future
accesses of that location to a location in the healer RAM of the memory controller chip. Each
memory controller has 32 locations reserved for healing of RAM. When all 32 locations have been
used, the CPU is notified that the healer is full. The operating system then tests each of the healed
locations to determine if that location is still faulty or if a soft error caused the failure. Overflowed
healer CAMs can be cleared and reused by the operating system.

Advertisement

Table of Contents
loading

This manual is also suitable for:

90409000 5309000 540Hp 9000 series 500

Table of Contents