Ibm Terminology Versus X86 Terminology; Error Handling - IBM Power Systems S822LC Technical Overview And Introduction

Hide thumbs Also See for Power Systems S822LC:
Table of Contents

Advertisement

Memory subsystem
The memory subsystem has proactive memory scrubbing to prevent accumulation of
multiple single-bit errors. The ECC scheme can correct the complete failure of any one
memory module within an ECC word. After marking the module as unusable, the ECC
logic can still correct single-symbol (two adjacent bit) errors. An uncorrectable error of data
of any layer of cache up to the main memory is marked to prevent usage of fault data. The
processor's memory controller and the memory buffer have retry capabilities for certain
fetch and store faults.

2.3.2 IBM terminology versus x86 terminology

The different components and descriptions in the boot process have similar functions, but
have different terms for POWER8 processor-based and x86-based scale-out servers.
Table 2-1 shows a quick overview of the terminology.
Table 2-1 Terminology
IBM
SBE
Host Boot
OPAL
OCC
HBRT

2.3.3 Error handling

This section describes how the Power S822LC server handles different errors and recovery
functions. It provides some general information and helps you understand some techniques.
Processor core/cache correctable error handling
The OPAL firmware provides a hypervisor and operating system-independent layer that uses
the robust error-detection and self-healing functions that are built into the POWER8 processor
and memory buffer modules.
The processor address-paths and data-paths are protected with parity or error-correction
codes (ECC). The control logic, state machines, and computational units have sophisticated
error detection. The processor core soft errors or intermittent errors are recovered with
processor instruction retry. Unrecoverable errors are reported as an MC. Errors that affect the
integrity of data lead to system checkstop.
The Level 1 (L1) data and instruction caches in each processor core are parity-protected, and
data is stored through to L2 immediately. L1 caches have a retry capability for intermittent
errors and a cache set delete mechanism for handling solid failures.
The L2 and L3 caches in the POWER8 processor and L4 cache in the memory buffer chip are
protected with double-bit detect, single-bit correct ECC.
x86
Description
Undisclosed
Self-Boot Engine: Starts the boot process.
BIOS
Core, Powerbus (SMP), and memory initialization.
BIOS/ VT-d / UEFI
KVM hardware abstraction, PCIe RC, IODA2 (VT-d),
and open firmware.
PCU, off chip
Performs real-time functions, such as power
microprocessors
management.
N/A
Correctable error monitoring and OCC monitoring.
Chapter 2. Management, Reliability, Availability, and Serviceability
35

Advertisement

Table of Contents
loading

Table of Contents