Detecting Errors; Error Checkers, Fault Isolation Registers, And First-Failure Data Capture - IBM Power System E850C Technical Overview And Introduction

Hide thumbs Also See for Power System E850C:

Installing (68 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

page of 160

/ 160
Contents
Table of Contents
Bookmarks

Table of Contents

customer, an IBM service support representative (SSR), or an authorized warranty service

provider.

The serviceability features that are delivered in this system provide a highly efficient service

environment by incorporating the following attributes:

A design for customer setup (CSU), customer installable features (CIFs), and

customer-replaceable units (CRUs)

ED/FI incorporating FFDC

Converged service approach across multiple IBM server platforms

Concurrent Firmware Maintenance (CFM)

This section provides an overview of how these attributes contribute to efficient service in the

progressive steps of error detection, analysis, reporting, notification, and repair found in all

POWER processor-based systems.

4.5.1 Detecting errors

The first and most crucial component of a solid serviceability strategy is the ability to

accurately and effectively detect errors when they occur.

Although not all errors are a threat to system availability, those that go undetected can cause

problems because the system has no opportunity to evaluate and act if necessary. POWER

processor-based systems employ IBM z™ Systems server-inspired error detection

mechanisms, extending from processor cores and memory to power supplies and storage

devices.

4.5.2 Error checkers, fault isolation registers, and First-Failure Data Capture

IBM POWER processor-based systems contain specialized hardware detection circuitry that

is used to detect erroneous hardware operations. Error checking hardware ranges from parity

error detection that is coupled with Processor Instruction Retry and bus try again, to ECC

correction on caches and system buses.

Within the processor and memory subsystem error-checkers, error-check signals are

captured and stored in hardware FIRs. The associated logic circuitry is used to limit the

domain of an error to the first checker that encounters the error. In this way, runtime error

diagnostic tests can be deterministic so that for every check station, the unique error domain

for that checker is defined and mapped to field-replaceable units (FRUs) that can be repaired

when necessary.

Integral to the Power Systems design is the concept of FFDC. FFDC is a technique that

involves sufficient error checking stations and coordination of fault reporting so that faults are

detected and the root cause of the fault is isolated. FFDC also expects that necessary fault

information can be collected at the time of failure without needing to re-create the problem or

run an extended tracing or diagnostics program.

For the vast majority of faults, a good FFDC design means that the root cause is isolated at

the time of the failure without intervention by a service representative. For all faults, good

FFDC design still makes failure information available to the service representative. This

information can be used to confirm the automatic diagnosis. More detailed information can be

collected by a service representative for rare cases where the automatic diagnosis is not

adequate for fault isolation.

116

IBM Power System E850C: Technical Overview and Introduction

Table of Contents

Need help?

Do you have a question about the Power System E850C and is the answer not in the manual?

Detecting Errors; Error Checkers, Fault Isolation Registers, And First-Failure Data Capture - IBM Power System E850C Technical Overview And Introduction

4.5.1 Detecting errors

4.5.2 Error checkers, fault isolation registers, and First-Failure Data Capture

Need help?

Related Manuals for IBM Power System E850C

Related Content for IBM Power System E850C

Table of Contents