IBM System/370 145 Manual page 197

Hide thumbs Also See for System/370 145:
Table of Contents

Advertisement

that alter data as they execute, so that instructions can be retried
from the point of correct execution.
When the CPU is enabled for machine check
interrup~ions,
an
interruption takes place after a CPU error occurs and is retried.
If
microinstruction retry was successful, the failure need only be
recorded; if unsuccessful, programmed recovery procedures are required.
The microinstruction retry feature provides the Model 145 with the
ability to recover from intermittent, CPU failures that would otherwise
cause' a system halt and necessitat1e a re-IPL or cause an executing
program to be terminated.
Correcbed errors are logged by recovery
routines for later diagnosis duriug scheduled maintenance periods,
thereby increasing system availability,.
Retry of failing CPU operations on the Model 40 is not provided by
either system ' hardware or programming support.
ECC VALIDITY CHECKING ON PROCESSOR AND CONTROL STORAGE
The ECC method of validity checking on both processor and control
storage provides automatic single-lbit error detection and correction.
It also detects all double-bit and some multiple-bit processor and
control storage errors but does
n~t
correct them.
Checking is handled
on an eight-byte basis, using an eight-bit'modified Hamming code, rather
than on a single-byte. basis, using a single parity bit.
However, parity
checking is still used to verify oither data in a Model 145 system that
is not contained in processor or control storage.
Models 30 and 40 use
parity checking for main storage data verification.
Data enters and leaves storage :in the CPU through the storage adapter
unit, which performs ECC validity
4~hecking
on each doubleword.
Another
storage adapter is contained in
thE~
processor storage frame.
When a
doubleword (72 bits, as shown in Figure 50.10.1) is fetched from
processor or control storage, the appropriate storage adapter unit
checks the eight-bit ECC code to validate the 64 data bits.
If the data
is correct, the adapter unit generates the appropriate parity bit for
each of the eight data bytes and reformats the doubleword to look as
shown in Figure 50.10.2.
If a sin9le-bit error is detected, the
identified data bit in error is
co)~rected
automatically by the corrector
unit in the storage adapter and sent to the
cpu.
A corrected doubleword
is sent back to control storage but. not. back to processor storage.
When
a doubleword is to be placed in processor storage by a program or in
control storage during
microprogrcu~
loading, the storage adapter unit
strips the eight parity bits, cons1:ructs the necessary eight-bit ECC
code, and appends the code to the «;4 data bits.
The 72 bits are then
stored as shown in Figure 50.10.1.
Additional CPU time is required to
cor:rect a single-bit error that occ:urs for a fetch to control storage.
When a single-bit storage error occurs, the hardware also determines
whether the error is intermittent or solid by retrying the storage
operation to see whether the error occurs again.
With one exception,
only intermittent single-bit storage errors can cause a machine check.
When an intermittent single-bit
stc~rage
error is detected and corrected
during the execution of an instruct:ion or I/O operation, a machine check
pending latch is set on and the opE!ration continues.
At the completion
of the CPU operation, a machine chE!ck interruption occurs to allow error
recording to be done unless the CPU has been disabled for ECC correction
interruptions.
The occurrence of
a
machine check interruption after an
intermittent single-bit processor or control storage correction is
dependent on the setting of three :E:CC mode bits in a special mode
register in the CPU and on a mask bit (recovery mask) in a control
register.
The mode register bits can be set by using the DIAGNOSE
instruction.
A Guide to the IBM System/370 Model. 145
187

Advertisement

Table of Contents
loading

Table of Contents