Memory Protection - IBM BladeCenter PS700 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS700:
Table of Contents

Advertisement

the POWER Hypervisor when a processor core reaches its predefined error limit. Then, the
POWER Hypervisor dynamically deconfigures the failing core and is called out for
replacement. The entire process is transparent to the partition owning the failing instruction.
If there are available inactivated processor cores or capacity-on-demand (CoD) processor
cores, the system effectively puts a CoD processor into operation after it has been
determined that an activated processor is no longer operational. In this way the server
remains with its total processor power.
If there are no CoD processor cores available, system-wide total processor capacity is
lowered beneath the licensed number of cores.
Single processor checkstop
As in POWER6, POWER7 provides single processor check stopping for certain processor
logic or command or control errors that cannot be handled by the availability enhancements
mentioned in "Dynamic processor deallocation" on page 103.
This reduces the probability of any one processor affecting total system availability by
containing most processor checkstops to the partition that was using the processor at the
time full checkstop goes into effect.
Even with all these availability enhancements to prevent processor errors from affecting
system-wide availability, errors might result on a system-wide outage.

4.3.3 Memory protection

A memory protection architecture that provides good error resilience for a relatively small L1
cache might be inadequate for protecting the much larger system main store. Therefore, a
variety of protection methods are used in POWER processor-based systems to avoid
uncorrectable errors in memory.
Memory protection plans must take into account many factors:
Size
Desired performance
Memory array manufacturing characteristics
POWER7 processor-based systems have a number of protection schemes designed to
prevent, protect, or limit the effect of errors in main memory. This includes the following
capabilities:
64-byte ECC code
This innovative ECC algorithm from IBM research allows a full 8-bit device kill to be
corrected dynamically. This ECC code mechanism works across DIMM pairs on a rank
basis. (Depending on the size, a DIMM might have one, two, or four ranks.) With this ECC
code, an entirely bad DRAM chip can be marked as bad (chip mark). After marking the
DRAM as bad, the code corrects all the errors in the bad DRAM. The code can
additionally mark a 2-bit symbol as bad and correct it. Providing a double-error detect or
single error correct ECC or a better level of protection is additional to the detection or
correction of a chipkill event.
Hardware assisted memory scrubbing
Memory scrubbing is a method for dealing with intermittent errors. IBM POWER
processor-based systems periodically address all memory locations. Any memory
locations with a correctable error are rewritten with the correct data.
104
IBM BladeCenter PS700, PS701, and PS702 Technical Overview and Introduction

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps701Bladecenter ps702

Table of Contents