Cache Protection - IBM BladeCenter PS700 Technical Overview And Introduction

Hide thumbs Also See for BladeCenter PS700:
Table of Contents

Advertisement

If an uncorrectable error in memory is discovered, the logical memory block that is associated
with the address with the uncorrectable error is marked for deallocation by the POWER
Hypervisor. This deallocation takes effect on a partition reboot if the logical memory block is
assigned to an active partition at the time of the fault. In addition, the system deallocates the
entire memory group associated with the error on all subsequent system reboot operations
until the memory is repaired. This approach is intended to guard against future uncorrectable
errors when waiting for parts replacement.
Note: Although memory page deallocation handles single cell failures, because of the
sheer size of data in a data bit line, it might be inadequate for dealing with more
catastrophic failures. Redundant bit steering continues to be the preferred method for
dealing with these types of problems.
Memory persistent deallocation
Defective memory discovered at boot time is automatically switched off. If the service
processor detects a memory fault at boot time, it marks the affected memory as bad so it is
not to be used on subsequent reboots.
Upon reboot, if not enough memory is available to meet minimum partition requirements, the
POWER Hypervisor reduces the capacity of one or more partitions.
Depending on the configuration of the system, the IVM Electronic Service Agent™, OS
Service Focal Point, or BladeCenter Advanced Management Module Service Advisor
receives a notification of the failed component, and triggers a service call.

4.3.4 Cache protection

POWER7 processor-based systems are designed with cache protection mechanisms,
including cache line delete in both L2 and L3 arrays, Processor Instruction Retry and
Alternate Processor Recovery protection on L1-I and L1-D, and redundant Repair bits in L1-I,
L1-D, and L2 caches, as well as L2 and L3 directories.
L1 instruction and data array protection
The POWER7 processor's instruction and data caches are protected against intermittent
errors using Processor Instruction Retry and against permanent errors by Alternate
Processor Recovery. L1 cache is divided into sets. POWER7 processor can deallocate all but
one before doing a Processor Instruction Retry. In addition, faults in the Segment Lookaside
Buffer array are recoverable by the POWER Hypervisor. The SLB is used in the core to
perform address translation calculations.
L2 and L3 array protection
The L2 and L3 caches in the POWER7 processor are protected with double-bit-detect
single-bit-correct error detection code (ECC). Single-bit errors are corrected before
forwarding to the processor, and subsequently written back to L2 and L3.
In addition, the caches maintain a cache-line delete capability. A threshold of correctable
errors detected on a cache line can result in the data in the cache line being purged and the
cache line removed from further operation without requiring a reboot. An ECC uncorrectable
error detected in the cache can also trigger a purge and delete of the cache line. This does
not result in a loss of operation because an unmodified copy of the data can be held on
system memory to reload the cache line from main memory. Modified data would be handled
through Special Uncorrectable Error handling.
L2 and L3 deleted cache lines are marked for persistent deconfiguration on subsequent
system reboots until it can be replaced.
Chapter 4. Continuous availability and manageability
107

Hide quick links:

Advertisement

Table of Contents
loading

This manual is also suitable for:

Bladecenter ps701Bladecenter ps702

Table of Contents