Memory Protection - IBM Power 720 Overview

Hide thumbs Also See for Power 720:

Overview (59 pages)

Installation manual (64 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

page of 206

/ 206
Contents
Table of Contents
Bookmarks

Table of Contents

With the instruction retry function, when an error is encountered in the core, in caches and

certain logic functions, the POWER7+ processor first automatically retries the instruction. If

the source of the error was truly transient, the instruction succeeds and the system can

continue as before.

Before POWER6: On IBM systems prior to POWER6, such an error typically caused a

checkstop.

Alternate processor retry

Hard failures are more difficult, being permanent errors that will be replicated each time that

the instruction is repeated. Retrying the instruction does not help in this situation because the

instruction will continue to fail.

As introduced with POWER6, POWER7+ processors have the ability to extract the failing

instruction from the faulty core and retry it elsewhere in the system. The failing core is then

dynamically deconfigured and scheduled for replacement.

Dynamic processor deallocation

Dynamic processor deallocation enables automatic deconfiguration of processor cores when

patterns of recoverable core-related faults are detected. Dynamic processor deallocation

prevents a recoverable error from escalating to an unrecoverable system error, which might

otherwise result in an unscheduled server outage. Dynamic processor deallocation relies on

the service processor's ability to use recoverable error information generated by FFDC to

notify the POWER Hypervisor when a processor core reaches its predefined error limit. The

POWER Hypervisor then dynamically deconfigures the failing core and notifies the system

administrator that a replacement is needed. The entire process is transparent to the partition

owning the failing instruction.

Single processor checkstop

As in the POWER6 processor, the POWER7+ processor provides single core check-stopping

for certain processor logic, command, or control errors that cannot be handled by the

availability enhancements in the preceding section.

This approach significantly reduces the probability of any one processor affecting total system

availability by containing most processor checkstops to the partition that was using the

processor at the time full checkstop goes into effect.

Even with all these availability enhancements to prevent processor errors from affecting

system-wide availability into play, there will be errors that can result in a system-wide outage.

4.2.3 Memory protection

A memory protection architecture that provides good error resilience for a relatively small L1

cache might be inadequate for protecting the much larger system main store. Therefore, a

variety of protection methods are used in all POWER processor-based systems to avoid

uncorrectable errors in memory.

Memory protection plans must take into account many factors, including these items:

Size

Desired performance

Memory array manufacturing characteristics

Chapter 4. Continuous availability and manageability

155

Table of Contents

Show Quick Links

Hide quick links:

Table of Contents

This manual is also suitable for:

Power 740

Memory Protection - IBM Power 720 Overview

4.2.3 Memory protection

Hide quick links:

Related Manuals for IBM Power 720

Related Content for IBM Power 720

This manual is also suitable for:

Table of Contents