Cache Protection Mechanisms - IBM Power 595 Technical Overview And Introduction

Table of Contents

Advertisement

processor host bridge (PHB) chip. When the PHB chip detects a problem. it rejects the data,
preventing data being written to the I/O device. The PHB then enters a freeze mode halting
normal operations. Depending on the model and type of I/O being used, the freeze can
include the entire PHB chip, or simply a single bridge. This results in the loss of all I/O
operations that use the frozen hardware until a power on reset of the PHB. The impact to a
partition or partitions depends on how the I/O is configured for redundancy. In a server
configured for failover availability, redundant adapters spanning multiple PHB chips could
enable the system to recover transparently, without partition loss.

4.2.3 Cache protection mechanisms

In POWER5 processor-based servers, the L1 instruction cache (I-cache), directory, and
instruction effective to real address translation (I-ERAT) are protected by parity. If a parity
error is detected, it is reported as a cache miss or I-ERAT miss. The cache line with parity
error is invalidated by hardware and the data is refetched from the L2 cache. If the error
occurs again (the error is solid), or if the cache reaches its soft error limit, the processor core
is dynamically deallocated and an error message for the FRU is generated.
Although the L1 data cache (D-cache) is also parity-checked, it gets special consideration
when the threshold for correctable errors is exceeded. The error is reported as a synchronous
machine check interrupt. The error handler for this event is executed in the POWER
Hypervisor. If the error is recoverable, the POWER Hypervisor invalidates the cache (clearing
the error). If additional soft errors occur, the POWER Hypervisor will disable the failing portion
of the L1 D-cache when the system meets its error threshold. The processor core continues
to run with degraded performance. A service action error log is created so that when the
machine is booted, the failing part can be replaced. The data ERAT and translation look aside
buffer (TLB) arrays are handled in a similar manner.
L1 instruction and data array protection
The POWER6 processor's instruction and data caches are protected against temporary
errors by using the POWER6 processor instruction retry feature and against solid failures by
alternate processor recovery, both mentioned earlier. In addition, faults in the Segment
Lookaside Buffer (SLB) array are recoverable by the POWER Hypervisor.
L2 array protection
On a POWER6 processor system, the L2 cache is protected by ECC, which provides
single-bit error correction and double-bit error detection. Single-bit errors are corrected before
forwarding to the processor, and subsequently written back to L2. Like the other data caches
and main memory, uncorrectable errors are handled during run-time by the special
uncorrectable error handling mechanism. Correctable cache errors are logged and if the error
reaches a threshold, a dynamic processor deallocation event is initiated.
Starting with POWER6 processor systems, the L2 cache is further protected by incorporating
a dynamic cache line delete algorithm. Up to six L2 cache lines might be automatically
deleted. Deletion of a few cache lines are unlikely to adversely affect server performance.
When six cache lines have been repaired, the L2 is marked for persistent deconfiguration on
subsequent system reboots until it can be replaced.
L3 cache
The L3 cache is protected by ECC and special uncorrectable error handling. The L3 cache
also incorporates technology to handle memory cell errors.
During system runtime, a correctable error is reported as a recoverable error to the service
processor. If an individual cache line reaches its predictive error threshold, the cache is
142
IBM Power 595 Technical Overview and Introduction

Advertisement

Table of Contents
loading

Table of Contents