IBM System/370 Manual page 170

Hide thumbs Also See for System/370:

page of 194

/ 194
Contents
Table of Contents
Bookmarks

Table of Contents

accomplished by programmed recovery to allow system operations to

continue whenever possible and

the recording of system status for

both transient (corrected) and permanent (uncorrected) hardware errors.

MACHINE CHECK HANDLER:

During IPL of a control program containing

Model 165 RMS routines, machine check mask bits are enabled, and control

logouts to occur.

MeH receives control after the occurrence of both soft and hard

machine check interrupts.

When a soft machine check occurs (successful

CPU retry, single-bit processor storage error corrected, time of day

clock damage, or multiple-bit processor storage error during an I/O

operation), MCH formats a recovery report record to

written in the

system error recording data set SYS1.LOGREC.

This record contains

pertinent information about the error, including pertinent data from

the logout areas, an indication of the recovery that occurred,

identification of the job, job step, and program involved in the error,

the date, and the time of day.

The operator is informed of successful

CPU retries, single-bit processor storage corrections, and an error

in the time of day clock.

MeR performs an additional function when a CPU retry was necessary

because of a buffer malfunction.

When an error occurs in the buffer,

as indicated in the extended logout area, MCR updates a programmed

buffer error counter.

After a certain number of buffer errors occur,

the entire high-speed buffer is disabled and MCH notifies the operator

of this fact.

The operator can allow the system to continue running

in degraded mode, if necessary.

All CPU fetches are then made directly

to processor storage, bypassing the buffer.

Alternately, the operator

can terminate system operations and request CE diagnosis and repair

of the buffer.

Prior to relinquishing CPU control, MCH determines whether or not

an automatic mode switch from recording mode to quiet mode should take

place if a CPU retry or an ECC correction recovery has just occurred.

The determination of whether to switch to nonrecording (quiet) mode

is made on the basis of the number of soft machine checks of a specific

type that occur during system operation.

Error count thresholds are

maintained separately for successful CPU retry and successful processor

storage single-bit error corrections.

The IBM-supplied threshold

values can be altered when the control program is generated.

MCR switches the system to quiet mode for either ECC corrections

only (the DIAGNOSE instruction

used to change the ECC mode bit from

full recording to quiet mode) or for both CPU retry and ECC corrections

(the System Recovery mask bit is disabled).

Mode switching occurs

if the number of soft machine checks that occur during system operation

exceeds the specified error count threshold for that type (or if

SYS1.LOGREC is full).

The operator is informed of the mode switch

and can switch back to recording mode at any time thereafter.

Mode switching is implemented to attempt to prevent SYS1.LOGREC

from being filled with recovery reports when a recurring correctable

error condition exists that would cause many reports to be generated.

When a System Damage hard machine check occurs (uncorrectable or

unretryable CPU error, multiple-bit processor storage error, or a

storage protect key failure), MeR determines whether the error is one

that is correctable by programming.

A multiple-bit processor storage

error or a storage protect key failure associated with CPU processing

causes control to be given to the repair portion of the program damage

assessment and repair (PDAR) routine of MCH.

PDAR can repair damaged

control program storage areas by loading a new copy of the affected

module if the module is marked reentrant and refreshable (it has been

Table of Contents

Chapters

Table of Contents

This manual is also suitable for:

165

IBM System/370 Manual page 170

Chapters

Related Manuals for IBM System/370

Related Products for IBM System/370

This manual is also suitable for:

Table of Contents