Predictive Failure Analysis (Pfa); Disk Scrubbing; Disk Path Redundancy - IBM TotalStorage DS6000 Series Redbooks

Concepts and architecture

Hide thumbs Also See for TotalStorage DS6000 Series:

Redbook (578 pages)

Attachment manual (298 pages)

Introduction and planning manual (119 pages)

Table Of Contents

100

101

102

103

104

105

106

107

108

109

110

111

112

113

114

115

116

117

118

119

120

121

122

123

124

125

126

127

128

129

130

131

132

133

134

135

136

137

138

139

140

141

142

143

144

145

146

147

148

149

150

151

152

153

154

155

156

157

158

159

160

161

162

163

164

165

166

167

168

169

170

171

172

173

174

175

176

177

178

179

180

181

182

183

184

185

186

187

188

189

190

191

192

193

194

195

196

197

198

199

200

201

202

203

204

205

206

207

208

209

210

211

212

213

214

215

216

217

218

219

220

221

222

223

224

225

226

227

228

229

230

231

232

233

234

235

236

237

238

239

240

241

242

243

244

245

246

247

248

249

250

251

252

253

254

255

256

257

258

259

260

261

262

263

264

265

266

267

268

269

270

271

272

273

274

275

276

277

278

279

280

281

282

283

284

285

286

287

288

289

290

291

292

293

294

295

296

297

298

299

300

301

302

303

304

305

306

307

308

309

310

311

312

313

314

315

316

317

318

319

320

321

322

323

324

325

326

327

328

329

330

331

332

333

334

335

336

337

338

339

340

341

342

343

344

345

346

347

348

349

350

351

352

353

354

355

356

357

358

359

360

361

362

363

364

365

366

367

368

369

370

371

372

373

374

375

376

377

378

379

380

381

382

383

384

385

386

387

388

389

390

391

392

393

394

395

396

397

398

399

400

401

402

403

404

405

406

Table of Contents

DDM, then approximately half of the 146 GB DDM would be wasted since that space is not

needed. The problem here is that the failed 73 GB DDM will be replaced with a new 73 GB

DDM. So the DS6000 microcode will most likely migrate the data on the 146 GB DDM onto

the recently replaced 73 GB DDM. When this process completes, the 73 GB DDM will rejoin

the array and the 146 GB will become the spare again. Another example would be if we fail a

10k RPM DDM onto a 15k RPM DDM. While this means that the data has now moved to a

faster DDM, the replacement DDM will be the same as the failed DDM. This means the spare

will now be a 10k RPM DDM. This could result in a 15k RPM DDM being spared onto a 10k

RPM DDM. This is not desirable. Again a smart failback of the spare will be performed once a

suitable replacement DDM has been made available.

Hot plugable DDMs

Replacement of a failed drive does not affect the operation of the DS6000 because the drives

are fully hot plugable. Due to the fact that each disk plugs into a switch, there is no loop break

associated with the removal or replacement of a disk. In addition, there is no potentially

disruptive loop initialization process.

3.3.4 Predictive Failure Analysis (PFA)

The drives used in the DS6000 incorporate Predictive Failure Analysis (PFA) and can

anticipate certain forms of failures by keeping internal statistics of read and write errors. If the

error rates exceed predetermined threshold values, the drive will be nominated for

replacement. Because the drive has not yet failed, data can be copied directly to a spare

drive. This avoids using RAID-5 or RAID-10 recovery to reconstruct all of the data onto the

spare drive. The DS6000 will alert the user and can also use call home e-mail notification.

3.3.5 Disk scrubbing

The DS6000 will periodically read all sectors on a disk. This is designed to occur without any

interference to application performance. If ECC-correctable bad bits are identified, the bits are

corrected immediately by the DS6000. This reduces the possibility of multiple bad bits

accumulating in a sector beyond the ability of ECC to correct them. If a sector contains data

that is beyond ECC's ability to correct, then RAID is used to regenerate the data and write a

new copy onto a spare sector on the disk. The scrubbing process applies to both array

members and spare DDMs.

3.3.6 Disk path redundancy

Each DDM in the DS6000 is attached to two 22 port SAN switches. These switches are built

into the RAID or SBOD controller cards. Figure 3-5 on page 56 depicts the redundancy

features of the DS6000 switched disk architecture. Each disk has two separate connections

to the midplane. This allows it to be simultaneously attached to both switches. If either a RAID

or SBOD controller card is removed from an enclosure, the switch that is included in that

controller is also removed. However, the remaining controller retains the ability to

communicate with all the disks via the remaining switch.

Figure 3-5 also shows the connection paths to the expansion enclosures. To the left and right

you can see paths from the switches and Fibre Channel chipset that travel to the device

adapter ports at top left and top right. These ports are depicted in Figure 2-2 on page 24.

From each controller we have two paths to each expansion enclosure. This means that we

can easily survive the loss of a single path (which would mean the loss of one out of four

paths) due to the failure of, for instance, a cable or an optical port. We can also survive the

loss of an entire RAID controller or SBOD controller (which would remove two out of four

Chapter 3. RAS

Table of Contents

Predictive Failure Analysis (Pfa); Disk Scrubbing; Disk Path Redundancy - IBM TotalStorage DS6000 Series Redbooks

3.3.4 Predictive Failure Analysis (PFA)

3.3.5 Disk scrubbing

3.3.6 Disk path redundancy

Related Manuals for IBM TotalStorage DS6000 Series

Related Content for IBM TotalStorage DS6000 Series

Table of Contents