Section 60: Reliability, Availability, And Serviceability (Ras); 60:05 Introduction - IBM 4381 Manual

Table of Contents

Advertisement

Section 60: Reliability, Availability, and Serviceability (RAS)
60:05 Introduction
The objectives of the RAS features of 4381 Processors are to
(1)
reduce the
frequency and impact of system interruptions that are caused by hardware failure
and that necessitate a re-IPL and (2) reduce the time required to locate and repair
malfunctions. RAS features of 4381 Processors are as follows:
Hardware reliability is enhanced through use of inherently more reliable logic
technology packaging than was used in previous intetmediate-scale processors.
Recovery facilities, both hardware- and program-supported, are provided to
reduce the number of failures that cause a complete system termination. This
permits deferred maintenance.
Extensive diagnostic facilities are provided that are designed to reduce problem
location and repair time.
Each availability and serviceability feature is discussed in the remainder of this
section. The following recovery /repair features are implemented in hardware and
microcode:
Automatic retry of instructions when an instruction processing function error
occurs during the execution of most instructions. Hardware reconfiguration
facilities are also implemented to permit continued system operation when solid
failures occur in certain hardware components.
ECC validity checking on processor storage to correct all single-bit and detect
all double-bit and most multiple-bit errors. Most types of double-bit errors can
also be corrected via microcode.
1/0 operation retry facilities, including channel retry data provided in the
limited channel logout area (for System/370 mode) and channel/control unit
command retry procedures to correct failing
1/0
operations
Expanded machine check interruption facilities to support better error
recording and recovery procedures
Machine check error diagnosis (reference code generation) and logging by the
support processor to aid the customer engineer in faster problem determination
and to provide the ability to record errors even when the instruction processing
function malfunctions
Section 60: Reliability, Availability, and Serviceability (RAS)
97

Advertisement

Table of Contents
loading

Table of Contents