Memory Protection - IBM BladeCenter PS703 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS703:
Table of Contents

Advertisement

If there are available inactivated processor cores or capacity-on-demand (CoD) processor
cores, the system effectively puts a CoD processor into operation after it has been
determined that an activated processor is no longer operational. In this way the server
remains with its total processor power.
If there are no CoD processor cores available, system-wide total processor capacity is
lowered beneath the licensed number of cores.
Single processor checkstop
As in POWER6, POWER7 provides single processor check stopping for certain processor
logic or command or control errors that cannot be handled by the availability enhancements
mentioned previously.
This reduces the probability of any one processor affecting total system availability by
containing most processor checkstops to the partition that was using the processor at the
time full checkstop goes into effect.
Even with all these availability enhancements to prevent processor errors from affecting
system-wide availability, errors might result on a system-wide outage.

4.3.3 Memory protection

A memory protection architecture that provides good error resilience for a relatively small L1
cache might be inadequate for protecting the much larger system main store. Therefore, a
variety of protection methods are used in POWER processor-based systems to avoid
uncorrectable errors in memory.
Memory protection plans must take into account many factors, including:
Size
Desired performance
Memory array manufacturing characteristics
POWER7 processor-based systems have a number of protection schemes designed to
prevent, protect, or limit the effect of errors in main memory. This includes the following
capabilities:
64-byte ECC code
This innovative ECC algorithm from IBM research allows a full 8-bit device kill to be
corrected dynamically. This ECC code mechanism works across DIMM pairs on a rank
basis. (Depending on the size, a DIMM might have one, two, or four ranks.) With this ECC
code, an entirely bad DRAM chip can be marked as bad (chip mark). After marking the
DRAM as bad, the code corrects all the errors in the bad DRAM. The code can
additionally mark a 2-bit symbol as bad and correct it. Providing a double-error detect or
single error correct ECC or a better level of protection is additional to the detection or
correction of a chipkill event.
Hardware-assisted memory scrubbing
Memory scrubbing
processor-based systems periodically address all memory locations. Any memory
locations with a correctable error are rewritten with the correct data.
CRC
The bus transferring data between the processor and the memory uses CRC error
detection with a failed operation retry mechanism and the ability to retune bus parameters
124
IBM BladeCenter PS703 and PS704 Technical Overview and Introduction
is a method for dealing with intermittent errors. IBM POWER

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps704

Table of Contents