Reliability, Availability, And Serviceability (Ras) Features; 50:05 Introduction - IBM System/370 Manual

Hide thumbs Also See for System/370:
Table of Contents

Advertisement

SECTION 50:
RELIABILITY, AVAILABILITY, AND SERVICEABILITY (RAS)
FEATURES
50:05
INTRODUCTION
with the growth of more and more online data processing activities,
as distinguished from traditional batch accounting functions, the
availability of the data processing system becomes a very essential
factor in company operations, and complete system failure is extremely
disruptive.
Because of the growing frequency of online processing
and the fact that the System/370 Model 165 is designed to operate in
such an environment, IBM has provided an extensive group of advanced
reliability, availability, and serviceability features for the Model
165.
These RAS features are designed to improve the reliability of
hardware, to increase the availability of the computing system, and
to improve the serviceability of system hardware components.
The RAS features of the System/370 Model 165 are designed to reduce
the frequency and impact of system interruptions that are caused by
hardware failure and necessitate a re-IPL as follows.
• More reliable components"
such as integrated circuits with fewer
connections, will be used to improve hardware reliability.
• Recovery facilities, both hardware and programmdng systems, not
available on System/360 Models 65 and 75, are provided to reduce
considerably the number of failures that cause a complete system
termination.
This permits deferred maintenance.
• Repair procedures include more online diagnosis and repair of
malfunctions concurrently with normal job execution in a
multiprogramming environment in order to reduce the effect of such
repairs on system unavailable time.
Each RAS feature, recovery or repair, is discussed in the remainder
of this section.
The following recovery features are implemented in hardware:
• CPU retry of most failing CPU operations, including those caused
by a buffer malfunction
• ECC validity checking on processor
stora~e
to correct all single-
bit errors
• I/O operation retry facilities, including channel retry data and
channel/control unit command retry procedures, to correct failing
I/O operations
• Expanded machine check interrupt facilities to facilitate better
error recording and recovery procedures
The following recovery features are provided
by
programming systems:
• Recovery management support (RMS) to handle the expanded machine
check interrupt and channel retry data.
Model 165 MCH and CCH
routines are provided for OS MFT and MVT only.
• Error recovery procedures (ERP) to retry failing I/O device and
channel operations
69

Advertisement

Table of Contents
loading

This manual is also suitable for:

165

Table of Contents