Memory Protection - IBM BladeCenter PS700 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS700:

page of 148

/ 148
Contents
Table of Contents
Bookmarks

Table of Contents

the POWER Hypervisor when a processor core reaches its predefined error limit. Then, the

POWER Hypervisor dynamically deconfigures the failing core and is called out for

replacement. The entire process is transparent to the partition owning the failing instruction.

If there are available inactivated processor cores or capacity-on-demand (CoD) processor

cores, the system effectively puts a CoD processor into operation after it has been

determined that an activated processor is no longer operational. In this way the server

remains with its total processor power.

If there are no CoD processor cores available, system-wide total processor capacity is

lowered beneath the licensed number of cores.

Single processor checkstop

As in POWER6, POWER7 provides single processor check stopping for certain processor

logic or command or control errors that cannot be handled by the availability enhancements

mentioned in "Dynamic processor deallocation" on page 103.

This reduces the probability of any one processor affecting total system availability by

containing most processor checkstops to the partition that was using the processor at the

time full checkstop goes into effect.

Even with all these availability enhancements to prevent processor errors from affecting

system-wide availability, errors might result on a system-wide outage.

4.3.3 Memory protection

A memory protection architecture that provides good error resilience for a relatively small L1

cache might be inadequate for protecting the much larger system main store. Therefore, a

variety of protection methods are used in POWER processor-based systems to avoid

uncorrectable errors in memory.

Memory protection plans must take into account many factors:

Size

Desired performance

Memory array manufacturing characteristics

POWER7 processor-based systems have a number of protection schemes designed to

prevent, protect, or limit the effect of errors in main memory. This includes the following

capabilities:

64-byte ECC code

This innovative ECC algorithm from IBM research allows a full 8-bit device kill to be

corrected dynamically. This ECC code mechanism works across DIMM pairs on a rank

basis. (Depending on the size, a DIMM might have one, two, or four ranks.) With this ECC

code, an entirely bad DRAM chip can be marked as bad (chip mark). After marking the

DRAM as bad, the code corrects all the errors in the bad DRAM. The code can

additionally mark a 2-bit symbol as bad and correct it. Providing a double-error detect or

single error correct ECC or a better level of protection is additional to the detection or

correction of a chipkill event.

Hardware assisted memory scrubbing

Memory scrubbing is a method for dealing with intermittent errors. IBM POWER

processor-based systems periodically address all memory locations. Any memory

locations with a correctable error are rewritten with the correct data.

104

IBM BladeCenter PS700, PS701, and PS702 Technical Overview and Introduction

Table of Contents

Show Quick Links

Quick Links:
Overview of Ps700, Ps701, and Ps702...

Hide quick links:

Table of Contents

This manual is also suitable for:

Bladecenter ps701 Bladecenter ps702

Memory Protection - IBM BladeCenter PS700 Technical Overview And Introduction

4.3.3 Memory protection

Hide quick links:

Related Manuals for IBM BladeCenter PS700

Related Content for IBM BladeCenter PS700

This manual is also suitable for:

Table of Contents